airborne12 opened a new pull request, #60814:
URL: https://github.com/apache/doris/pull/60814
### What problem does this PR solve?
Issue Number: close #DORIS-24545
Problem Summary:
In `search()` function's lucene mode, queries with mixed explicit and
implicit operators produce different results from Elasticsearch. For example:
- Query: `"Sumer" OR Ptolemaic\ dynasty Limonene` with `default_operator=AND`
- ES result: 1 row
- Doris result: 0 rows (before fix)
**Root cause:** In Lucene's `QueryParserBase.addClause()`, only explicit
`CONJ_AND`/`CONJ_OR` modify the preceding term's occur. Implicit conjunction
(`CONJ_NONE`, i.e., space-separated terms without an explicit operator) only
affects the **current** term via `default_operator`, without modifying the
preceding term.
The FE `SearchDslParser.hasExplicitAndBefore()` incorrectly returned `true`
(based on `default_operator`) when no explicit AND token was found. This caused
implicit conjunction to be treated identically to explicit AND, making it
modify the preceding term's occur — diverging from Lucene/ES semantics.
**Example of the bug:**
For `a OR b c` with `default_operator=AND`:
- Before fix: `SHOULD(a) MUST(b) MUST(c)` — wrong, implicit space before `c`
incorrectly upgraded `b` from SHOULD to MUST
- After fix: `SHOULD(a) SHOULD(b) MUST(c)` — correct, matches ES behavior.
Only `c` gets MUST (from default_operator), `b` retains SHOULD (from the
preceding OR)
**Fix:** `hasExplicitAndBefore()` now returns `false` when no explicit AND
token is found, regardless of `default_operator`. Only explicit AND tokens
trigger the "introduced by AND" logic that modifies preceding terms.
### Release note
Fix search() lucene mode producing incorrect results when queries mix
explicit operators (OR/AND) with implicit conjunction (space-separated terms).
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [x] Yes. Implicit conjunction (space between terms) in lucene mode no
longer modifies the preceding term's occur. Only explicit AND/OR operators
modify preceding terms, matching Lucene/ES semantics.
- Does this need documentation?
- [ ] No.
- [ ] Yes.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]