dan-robertson opened a new issue, #15182: URL: https://github.com/apache/lucene/issues/15182
### Description Consider a regex of the form `a(b|)c`. This should match `ac` and `abc`. Some regexes of this form seem to behave incorrectly. To try to make a slightly more concrete or realistic example, one might want to search some logs for something like `[a-z-]*(|-prod|-main) `. I cobbled together some tests against a recent version of the lucene repo. I have three docs which respectively contain: - `foo-bar-baz` - `foo--baz` - `foo-test-baz` And then I spell a regex to match only the first two docs a bunch of different ways: 1. `.*foo-(bar|)-baz.*` - IllegalArgumentException: expected ')' at position 18 2. `.*foo-(|bar)-baz.*` - 0 matches but 2 expected 3. `.*(foo-(bar|)-baz).*` - IllegalArgumentException: expected ')' at position 20 4. `.*(foo-(|bar)-baz).*` - 0 matches but 2 expected 5. `.*foo-(bar|())-baz.*` - 2 matches 6. `.*foo-(bar|()?)-baz.*` - 2 matches 7. `.*foo-(bar|#?)-baz.*` - 2 matches 8. `.*(foo-(bar|())-baz).*` - 2 matches 9. `.*(foo-(bar|()?)-baz).*` - 2 matches 10. `.*(foo-(bar|#?)-baz).*` - 2 matches 11. `.*foo-(bar)?-baz.*` - 2 matches The first four cases seem incorrect to me. I came to this investigation after some problems with elasticsearch (v8.12.2, using lucene 9.9.2) where regexes following pattern number 5 also failed. Maybe that is some useful context. ### Version and environment details I added tests by modifying `lucene/core/src/test/org/apache/lucene/search/TestRegexpQuery.java` with a base revision of `cd1a4ecc9ead8e06b08f3bc2016297525f65b37c`. Here are other details in case they are relevant: OS: linux (el8 with a 6.12.39 kernel) Java: openjdk version "24.0.1" 2025-04-15 This is on an x86_64 box. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org