xiangfu0 commented on code in PR #18662:
URL: https://github.com/apache/pinot/pull/18662#discussion_r3365608136


##########
pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java:
##########
@@ -47,6 +53,42 @@ private GroupByUtils() {
   public static final int DEFAULT_MIN_NUM_GROUPS = 5000;
   public static final int MAX_TRIM_THRESHOLD = 1_000_000_000;
 
+  /**
+   * Builds the segment-level {@link GroupByResultsBlock} for a GROUP BY 
GROUPING SETS / ROLLUP / CUBE query,
+   * shared by {@code GroupByOperator} and {@code FilteredGroupByOperator}. 
When the group count exceeds the
+   * per-set budget ({@code perSetTrimSize * numGroupingSets}), a per-set 
bucketed trim (keyed on the
+   * {@code $grouping_id} discriminator at {@code discriminatorColumnIndex}) 
keeps each grouping set's own top
+   * candidates so a global top-K cannot starve low-magnitude sets such as the 
grand total; otherwise the full
+   * segment result is returned. The broker still applies the final ORDER BY + 
LIMIT across all sets.

Review Comment:
   Fixed in e1629d66fb — I took option 1 (disable combine/reducer trim for 
grouping-set queries).
   
   `GroupByUtils.createIndexedTableForCombineOperator` and 
`createIndexedTableForDataTableReducer` now short-circuit to 
`getTrimDisabledIndexedTable(...)` when `queryContext.isGroupingSets()`, before 
the ORDER BY trim-threshold logic runs:
   
   - Combine (per server): keeps all groups (bounded by `numGroupsLimit`) 
instead of a per-server top-K, so the server sends every group to the broker.
   - Reducer (broker): merges all server partials into the unbounded table 
first; `finish()` then keeps the correct global top-K and the broker applies 
the final ORDER BY + LIMIT over the fully-merged table.
   
   So a row that ranks below top-K on an individual segment/server is no longer 
trimmed before its partial aggregates are combined — the silent-wrong-result 
path is closed.
   
   The segment-level trim is the one that stays, and it is exactly the bucketed 
form you suggested: `TableResizer.trimInSegmentResultsByGroupingSet` keeps the 
top-K *within each `$grouping_id` bucket*, and it is opt-in (only when 
`minSegmentGroupTrimSize > 0`; disabled by default → exact). Its Javadoc now 
also notes it is, like plain GROUP BY segment trim, a per-segment approximation.
   
   Regression coverage: 
`GroupByUtilsTest.testGroupingSetQueriesDisableCombineAndReducerTrim` asserts 
that both `createIndexedTableForCombineOperator` and 
`createIndexedTableForDataTableReducer` return the trim-disabled 
`UnboundedConcurrentIndexedTable` for a grouping-set query even under 
aggressive trim settings (`minGroupTrimSize=1`, `groupTrimThreshold=1`), with a 
plain GROUP BY contrast that still gets the trim-enabled 
`ConcurrentIndexedTable`.
   
   On the requested "trim enabled" multi-segment e2e test: with this fix, 
grouping-set queries ignore the trim threshold entirely, so there is no longer 
a trim-enabled combine/reducer path for them to exercise end-to-end. 
Reducer/segment trim also only fires above `getTableCapacity(limit, 
minGroupTrimSize) = max(5*limit, minGroupTrimSize)` groups (>100 for the 
default floor), which a focused custom-data suite can't reach without becoming 
slow/flaky. The table-construction assertions above cover the decisive branch 
deterministically, and cross-segment merge correctness (counts summed across ≥2 
segments) is already validated end-to-end by 
`testRollupWithGenuineAndRolledUpNulls` and the other integration cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to