Shekharrajak commented on PR #2772:
URL: 
https://github.com/apache/datafusion-comet/pull/2772#issuecomment-3875299422

   > > This looks great overall @Shekharrajak, but could you mark it as 
incompatible as described in [#2772 
(comment)](https://github.com/apache/datafusion-comet/pull/2772#issuecomment-3529547218).
 I don't think it is realistic to be able to match Spark behavior fully, at 
least not as part of this PR.
   > > In future PRs, we can probably implement fallback rules for specific 
cases, such as Java-specific regex patterns.
   > 
   > I was able to find cases where Comet crashes at runtime when using 
lookahead, lookbehind, and back references, for example.
   
   Explore about it and writing down the findings for future reference: 
   
     Java's regex engine supports advanced features:
     • Lookahead: (?=...), (?!...)
     • Lookbehind: (?<=...), (?<!...)
     • Back references: \1, \2
   
     Rust's `regex` crate intentionally does not support any of these. It 
guarantees linear O(n) time by design, which means no backtracking -- and all 
three of those features require  backtracking.
   
   
     So if a user writes something like:
   
     SELECT split(email, '(?<=@)') FROM users
   
     * Spark: works fine -- Java regex handles the lookbehind.
     * Comet: the pattern gets passed to Rust's regex engine, which errors out 
at runtime because (?<=@) is unsupported syntax


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to