gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2046220895
Just looked more deeply, I think there's a bug in the logic around the
greediness of the matching. This test fails on the patch but passes on master:
```
assertMatch("%ba_", "foo bar baz", DruidPredicateMatch.TRUE);
```
The `%ba` matches `foo ba`, then the `_` matches `r`, then because there's
some string left over the overall match fails.
Making the first `%ba` greedy wouldn't be a good fit, because then this test
(which passes on the branch now) would probably start to fail:
```
assertMatch("%ba_____", "foo bar baz", DruidPredicateMatch.TRUE);
```
It seems like some kind of backtracking is necessary, like trying the
minimal-length match for `%ba` first, and then if the overall match fails,
trying a longer match.
@twilliamson what do you think about mitigating the original issue by
sticking with `java.util.Pattern`, but using your logic for pattern
normalization to simplify the regexps? Or, alternatively, enhancing the
hand-rolled logic in your patch to have some backtracking? That would surely
help for some patterns, like `%%%%x`. Although, I'm not sure if it will open
the door back up for "catastrophe". I haven't thought about the problem enough
to have a sense of that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]