xiangfu0 commented on code in PR #18662:
URL: https://github.com/apache/pinot/pull/18662#discussion_r3365608136
##########
pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java:
##########
@@ -47,6 +53,42 @@ private GroupByUtils() {
public static final int DEFAULT_MIN_NUM_GROUPS = 5000;
public static final int MAX_TRIM_THRESHOLD = 1_000_000_000;
+ /**
+ * Builds the segment-level {@link GroupByResultsBlock} for a GROUP BY
GROUPING SETS / ROLLUP / CUBE query,
+ * shared by {@code GroupByOperator} and {@code FilteredGroupByOperator}.
When the group count exceeds the
+ * per-set budget ({@code perSetTrimSize * numGroupingSets}), a per-set
bucketed trim (keyed on the
+ * {@code $grouping_id} discriminator at {@code discriminatorColumnIndex})
keeps each grouping set's own top
+ * candidates so a global top-K cannot starve low-magnitude sets such as the
grand total; otherwise the full
+ * segment result is returned. The broker still applies the final ORDER BY +
LIMIT across all sets.
Review Comment:
Fixed in e1629d66fb — I took option 1 (disable combine/reducer trim for
grouping-set queries).
`GroupByUtils.createIndexedTableForCombineOperator` and
`createIndexedTableForDataTableReducer` now short-circuit to
`getTrimDisabledIndexedTable(...)` when `queryContext.isGroupingSets()`, before
the ORDER BY trim-threshold logic runs:
- Combine (per server): keeps all groups (bounded by `numGroupsLimit`)
instead of a per-server top-K, so the server sends every group to the broker.
- Reducer (broker): merges all server partials into the unbounded table
first; `finish()` then keeps the correct global top-K and the broker applies
the final ORDER BY + LIMIT over the fully-merged table.
So a row that ranks below top-K on an individual segment/server is no longer
trimmed before its partial aggregates are combined — the silent-wrong-result
path is closed.
The segment-level trim is the one that stays, and it is exactly the bucketed
form you suggested: `TableResizer.trimInSegmentResultsByGroupingSet` keeps the
top-K *within each `$grouping_id` bucket*, and it is opt-in (only when
`minSegmentGroupTrimSize > 0`; disabled by default → exact). Its Javadoc now
also notes it is, like plain GROUP BY segment trim, a per-segment approximation.
Regression coverage:
`GroupByUtilsTest.testGroupingSetQueriesDisableCombineAndReducerTrim` asserts
that both `createIndexedTableForCombineOperator` and
`createIndexedTableForDataTableReducer` return the trim-disabled
`UnboundedConcurrentIndexedTable` for a grouping-set query even under
aggressive trim settings (`minGroupTrimSize=1`, `groupTrimThreshold=1`), with a
plain GROUP BY contrast that still gets the trim-enabled
`ConcurrentIndexedTable`.
On the requested "trim enabled" multi-segment e2e test: with this fix,
grouping-set queries ignore the trim threshold entirely, so there is no longer
a trim-enabled combine/reducer path for them to exercise end-to-end.
Reducer/segment trim also only fires above `getTableCapacity(limit,
minGroupTrimSize) = max(5*limit, minGroupTrimSize)` groups (>100 for the
default floor), which a focused custom-data suite can't reach without becoming
slow/flaky. The table-construction assertions above cover the decisive branch
deterministically, and cross-segment merge correctness (counts summed across ≥2
segments) is already validated end-to-end by
`testRollupWithGenuineAndRolledUpNulls` and the other integration cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]