haohuaijin opened a new pull request, #19653: URL: https://github.com/apache/datafusion/pull/19653
## Which issue does this PR close? close https://github.com/apache/datafusion/issues/19638 ## Rationale for this change see issue #19638 ## What changes are included in this PR? 1. Introduced `LimitOptions` struct limit field with both `limit` and optional `descending` ordering direction 2. Extended `TopKAggregation` optimizer rule to DISTINCT queries by recognizing `GROUP BY` queries without aggregates and setting the `descending` flag based on ordering direction 3. Enhanced `GroupedTopKAggregateStream` to handle DISTINCT by using group key as both priority queue key and value for DISTINCT operations 4. Updated Proto definitions to add optional `descending` field to `AggLimit` message for serialization/deserialization ## benchmark result <img width="731" height="475" alt="image" src="https://github.com/user-attachments/assets/05b6eb8c-186d-4b17-84a9-a2897dbcb095" /> ## Are these changes tested? yes, add test case in aggregates_topk.slt ## Are there any user-facing changes? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
