airborne12 opened a new pull request, #63692:
URL: https://github.com/apache/doris/pull/63692
## Proposed changes
Issue Number: close #N/A (Jira DORIS-25510)
### What problem does this PR solve?
When a variant column has a parent INVERTED index with parser, and a
sub-column is materialized in some segment as a non-string value (e.g. `{"c":
false}`), `variant_util::inherit_index` calls `remove_parser_and_analyzer()`
and writes a BKD/numeric index for that sub-column. The on-disk entry for
`(parent_index_id, "<sub>")` therefore exists but is **not** a Lucene fulltext
segment.
`MatchPredicateCollector::collect` (called from BM25 stats collection in
`OlapScanner::_prepare_impl`) does not have segment context, so when the
predicate references a variant sub-column it clones the parent fulltext index
meta and sets the sub-column path as suffix. In segments where the sub-column
happens to be non-string, `IndexFileReader::open(...)` returns a valid
`DorisCompoundReader` pointing at the BKD entry, and
`lucene::index::IndexReader::open(compound_reader.get())` throws
`CLuceneError(\"No segments* file found in DorisCompoundReader@...\")`.
That `CLuceneError` (derives from `std::exception`, not `doris::Exception`)
escapes `CollectionStatistics::process_segment`, bubbles through `collect()`
and `OlapScanner::_prepare_impl`, and the `ASSIGN_STATUS_IF_CATCH_EXCEPTION`
wrapper in `scanner_scheduler.cpp` only catches `doris::Exception` — so the BE
SIGABRTs during scanner prepare.
Minimal reproducer (from DORIS-25510):
```sql
create table t (
`id` int(11) NULL,
`v` variant NULL,
INDEX idx_v (`v`) USING INVERTED PROPERTIES(\"parser\" = \"english\")
) ENGINE=OLAP DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (\"replication_allocation\" = \"tag.location.default:1\");
insert into t values(1, '{\"a\": \"abc\"}');
insert into t values(2, '{\"b\": \"abc\"}');
insert into t values(3, '{\"c\": false}');
select score() from t where v[\"c\"] match \"abc\" order by score() limit 10;
-- BE coredumps
```
This PR wraps the `IndexReader::open` + searcher-cache-fill path in
`CollectionStatistics::process_segment` with a `try { ... } catch
(CLuceneError& e)` that logs and `continue`s to the next field. Skipping
contributes 0 to `_total_num_tokens` / `_term_doc_freqs` for the affected field
in that segment, which is the intended semantics for *no fulltext data for this
sub-column in this segment*. Existing `INVERTED_INDEX_FILE_NOT_FOUND` /
`INVERTED_INDEX_BYPASS` handling at `CollectionStatistics::collect` is
unchanged and still applies when the entry is genuinely absent.
The deeper schema-level fix — never cloning a fulltext parent meta for a
sub-column whose actual segment-level index was written as BKD — needs segment
context and is a follow-up. The defensive try/catch is enough to stop the abort
and is the same shape Doris uses elsewhere when CLucene exceptions cross the BE
/ Doris boundary.
### Release note
Fix BE crash when running `score()` / BM25-scoring queries against a variant
sub-column whose data in some segments is non-string while the parent variant
column has a fulltext INVERTED index.
### Check List (For Author)
- [x] Test:
- Regression test:
`regression-test/suites/inverted_index_p0/test_bm25_score_variant_boolean_subcolumn.groovy`
replays the exact DORIS-25510 reproducer (3 single-row inserts so each lands
in its own segment, including the boolean sub-column seg) and asserts the query
returns without crash.
- [x] Behavior changed: No (only converts a crash into a logged warning +
empty stats contribution for the affected sub-column / segment).
- [x] Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]