gianm commented on PR #16153:
URL: https://github.com/apache/druid/pull/16153#issuecomment-2046220895

   Just looked more deeply, I think there's a bug in the logic around the 
greediness of the matching. This test fails on the patch but passes on master:
   
   ```
   assertMatch("%ba_", "foo bar baz", DruidPredicateMatch.TRUE);
   ```
   
   The `%ba` matches `foo ba`, then the `_` matches `r`, then because there's 
some string left over the overall match fails.
   
   Making the first `%ba` greedy wouldn't be a good fit, because then this test 
(which passes on the branch now) would probably start to fail:
   
   ```
   assertMatch("%ba_____", "foo bar baz", DruidPredicateMatch.TRUE);
   ```
   
   It seems like some kind of backtracking is necessary, like trying the 
minimal-length match for `%ba` first, and then if the overall match fails, 
trying a longer match.
   
   @twilliamson what do you think about mitigating the original issue by 
sticking with `java.util.Pattern`, but using your logic for pattern 
normalization to simplify the regexps? Or, alternatively, enhancing the 
hand-rolled logic in your patch to have some backtracking? That would surely 
help for some patterns, like `%%%%x`. Although, I'm not sure if it will open 
the door back up for "catastrophe". I haven't thought about the problem enough 
to have a sense of that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to