gianm commented on PR #16153: URL: https://github.com/apache/druid/pull/16153#issuecomment-2046220895
Just looked more deeply, I think there's a bug in the logic around the greediness of the matching. This test fails on the patch but passes on master: ``` assertMatch("%ba_", "foo bar baz", DruidPredicateMatch.TRUE); ``` The `%ba` matches `foo ba`, then the `_` matches `r`, then because there's some string left over the overall match fails. Making the first `%ba` greedy wouldn't be a good fit, because then this test (which passes on the branch now) would probably start to fail: ``` assertMatch("%ba_____", "foo bar baz", DruidPredicateMatch.TRUE); ``` It seems like some kind of backtracking is necessary, like trying the minimal-length match for `%ba` first, and then if the overall match fails, trying a longer match. @twilliamson what do you think about mitigating the original issue by sticking with `java.util.Pattern`, but using your logic for pattern normalization to simplify the regexps? Or, alternatively, enhancing the hand-rolled logic in your patch to have some backtracking? That would surely help for some patterns, like `%%%%x`. Although, I'm not sure if it will open the door back up for "catastrophe". I haven't thought about the problem enough to have a sense of that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org