airborne12 opened a new pull request, #60500:
URL: https://github.com/apache/doris/pull/60500
## Proposed changes
Fix empty string MATCH on keyword index returning wrong results.
The multi-analyzer feature commit (2c950e140a5) incorrectly added an empty
string check that prevented `MATCH ''` from finding rows with empty string
values in keyword indexes.
For keyword index (no tokenization), empty string is a valid exact match
value and should be matchable. The previous code incorrectly skipped empty
strings with the comment "empty query should match nothing", which is wrong for
keyword indexes.
## Problem
```sql
-- Table with keyword index (no parser)
CREATE TABLE test (id INT, col TEXT, INDEX idx(col) USING INVERTED);
INSERT INTO test VALUES (1, ''), (2, 'data');
-- Before fix: returns 0 (WRONG!)
-- After fix: returns 1 (CORRECT!)
SELECT count() FROM test WHERE col MATCH '';
```
## Changes
This fix removes the empty string check for keyword index paths in:
- `be/src/vec/functions/match.cpp` (slow path)
- `be/src/olap/rowset/segment_v2/inverted_index_reader.cpp` (index path)
- `be/src/olap/rowset/segment_v2/inverted_index/analyzer/analyzer.cpp`
Added regression test `test_empty_string_match.groovy` to cover:
- Empty string match on keyword index (both index and slow paths)
- Empty string match on tokenized index (should return 0)
- match_any and match_all with empty string
## Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [ ] Manual test
- [ ] No need to test
- Behavior changed:
- [x] Yes. `MATCH ''` on keyword index now correctly matches rows with
empty string values.
- Does this need documentation?
- [ ] No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]