twilliamson commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2040716015

   This info is from 5 or 6 years back while working on stream processing 
systems at Facebook, but my recollection is that `re2j` had issues with UTF-8 
multi-byte sequences. Not sure if that's still the case, but I remember it not 
working as a drop-in replacement. We checked out what the Trino folks were 
doing at the time, and that's what led to us using Joni, which we were able to 
switch to without any of our pipeline owners noticing. From what I can 
remember, while it doesn't make hard runtime guarantees, in practice we didn't 
see it run into the same pathological behavior, but would still sometimes see 
exceptions for certain inputs (maybe `StackOverflowException`? but you also get 
those with `java.util.regex`…).
   
   Just checked, and it looks like Trino has since updated to [a custom `LIKE` 
implementation based on 
DFAs](https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/likematcher/DFA.java).
 (It looks quite complicated — I'm tempted to submit a MR to Trino with the 
same approach as in this MR…) Trino appears to still be [using 
Joni](https://github.com/trinodb/trino/blob/4d6df76558c999f01560f68f96698c441299e000/core/trino-main/src/main/java/io/trino/operator/scalar/JoniRegexpFunctions.java#L48)
 as the default for `regexp_* functions`, with [an option to use 
`re2j`](https://github.com/trinodb/trino/blob/4d6df76558c999f01560f68f96698c441299e000/core/trino-main/src/main/java/io/trino/sql/analyzer/RegexLibrary.java).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to