xiangfu0 commented on code in PR #18662:
URL: https://github.com/apache/pinot/pull/18662#discussion_r3362542660


##########
pinot-core/src/main/java/org/apache/pinot/core/util/GroupByUtils.java:
##########
@@ -47,6 +53,42 @@ private GroupByUtils() {
   public static final int DEFAULT_MIN_NUM_GROUPS = 5000;
   public static final int MAX_TRIM_THRESHOLD = 1_000_000_000;
 
+  /**
+   * Builds the segment-level {@link GroupByResultsBlock} for a GROUP BY 
GROUPING SETS / ROLLUP / CUBE query,
+   * shared by {@code GroupByOperator} and {@code FilteredGroupByOperator}. 
When the group count exceeds the
+   * per-set budget ({@code perSetTrimSize * numGroupingSets}), a per-set 
bucketed trim (keyed on the
+   * {@code $grouping_id} discriminator at {@code discriminatorColumnIndex}) 
keeps each grouping set's own top
+   * candidates so a global top-K cannot starve low-magnitude sets such as the 
grand total; otherwise the full
+   * segment result is returned. The broker still applies the final ORDER BY + 
LIMIT across all sets.

Review Comment:
   This fixes the starvation problem only for segment-local trimming. The same 
correctness bug still exists one stage later: 
`createIndexedTableForCombineOperator(...)` and 
`createIndexedTableForDataTableReducer(...)` still do a global ORDER BY trim 
for grouping-set queries. That means a row can be below top-K on every 
segment/server, get trimmed away there, and never be available to merge into 
the true global top-K after partial aggregates are combined. In other words, 
`ROLLUP/CUBE/GROUPING SETS` can still return silently wrong results as soon as 
multi-segment or broker-side trimming kicks in. Please either disable 
combine/reducer trim for grouping-set queries until it is bucketed by 
`$grouping_id`, or implement the same per-set trim there as well, and add a 
multi-segment regression test with ORDER BY + LIMIT and trim enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to