Jackie-Jiang opened a new pull request, #18588:
URL: https://github.com/apache/pinot/pull/18588

   ## Summary
   
   Refines the two index-based `DISTINCT` operators added in #17872 / #17820 
and rewrites their tests to drive the full broker → operator path.
   
   ### `JsonIndexDistinctOperator`
   
   - Argument validation moves into the constructor, mirroring 
`JsonExtractIndexTransformFunction.init`. The operator accepts 3-/4-/5-arg 
`jsonExtractIndex` calls (path, type, optional default, optional `JSON_MATCH` 
filter expression) and MV `_ARRAY` types (`INT_ARRAY`, `STRING_ARRAY`, etc.).
   - `canUseJsonIndexDistinct` is simplified to a function-name check; the 
planner routes every `jsonExtractIndex` call through the operator and lets the 
constructor surface invalid arguments as `IllegalArgumentException`.
   - The runtime path intersects per-value doc ids from the JSON index with the 
`WHERE`-clause filter through a single `remainingDocs` bitmap, so cross-path 
`JSON_MATCH`, base-column filters, and `IS NULL` filters all use the same code 
path.
   - `numDocsScanned` is populated from the filtered doc set (or total docs 
when the filter matches everything), so execution statistics line up with the 
scan/projection path.
   - New query option `jsonIndexDistinctSkipMissingPath`: when set, the 
operator skips parsing the 4-arg default, skips `remainingDocs` tracking, and 
suppresses the "Illegal Json Path" throw for the 3-arg form. Useful when the 
caller knows every doc has the path (or doesn't care about misses).
   
   ### `InvertedIndexDistinctOperator`
   
   - Caches `_totalDocs` in the constructor instead of recomputing per call.
   - DESC-sorted path short-circuits with `intersects` (boolean) rather than 
`getLongCardinality`, which is orders of magnitude cheaper on dense bitmaps.
   - Drops redundant `advanceIfNeeded(startDocId)` on the ASC sorted path and 
the redundant inner `filterIter.hasNext()` check.
   - Reports a correct `numDocsScanned` on the sorted / inverted paths 
(previously zero).
   - Inlines the `FilterPreparation` helper and renames `_numEntriesExamined` → 
`_numEntriesExaminedPostFilter` so the stats name matches its meaning.
   
   ### Tests
   
   - Both operators get a new queries-based test 
(`JsonIndexDistinctOperatorQueriesTest`, 
`InvertedIndexDistinctOperatorQueriesTest`) that drives `SELECT DISTINCT` 
through `BaseQueriesTest`, asserts on result tables, explain strings, and 
execution statistics (`numDocsScanned`, `numEntriesScannedInFilter`, 
`numEntriesScannedPostFilter`, `numTotalDocs`).
   - The older mock-based unit tests are removed — the queries tests cover the 
same behaviors against real segments.
   - All `OPTION(...)` syntax in the suites is converted to standard `SET a=b;` 
prefixes; repeated query strings are extracted into shared constants.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to