yashmayya opened a new pull request, #18598:
URL: https://github.com/apache/pinot/pull/18598

   For `SELECT DISTINCT col ... LIMIT n` and `GROUP BY col ... LIMIT n` without 
aggregate functions, the multi-stage engine currently ships every distinct 
group key from each server to the intermediate stage before applying the limit. 
This pushes the limit (and order-by-on-key) down to the leaf aggregate by 
default, so each server emits at most `limit` groups.
   
   This is safe for the no-aggregate case: each leaf produces complete group 
keys (no partial aggregation), so leaf-level trimming is exact for ordered 
queries and a valid subset for unordered ones. Queries with aggregate functions 
are unchanged — they remain gated behind the existing `is_enable_group_trim` 
hint/config. Limited to a single group set, so `ROLLUP` / `CUBE` / `GROUPING 
SETS` are excluded. Opt out per query with `/*+ 
aggOptions(is_enable_group_trim='false') */`.
   
   **Behavior change:** for an unordered `... LIMIT` (no `ORDER BY`), which 
rows are returned may differ from before — already unspecified in SQL.
   
   Covered by new planner plan tests (DISTINCT/GROUP BY + LIMIT, ORDER BY on 
key, HAVING, OFFSET, opt-out hint, and a multi-group-set negative case) and 
`GroupByOptionsTest` integration tests for paginated DISTINCT/GROUP BY.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to