liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126605415
> But other issue i found in the `AggregateExec`, when we push the limit to the agg exec and will select the > > ``` > if let Some(limit) = self.limit { > warn!("agg exec: {}", self.is_unordered_unfiltered_group_by_distinct()); > if !self.is_unordered_unfiltered_group_by_distinct() { > warn!("agg exec: create GroupedPriorityQueue"); > return Ok(StreamType::GroupedPriorityQueue( > GroupedTopKAggregateStream::new(self, context, partition, limit)?, > )); > } > } > ``` > > `GroupedTopKAggregateStream`. > > The implementation of `GroupedTopKAggregateStream` get the right result for the SQL, but the efficiency is not good, because we don't care about the order and don't need to consume all of downstream data In our sql: ``` select LO_SUPPKEY from SSB_1G.LINEORDER GROUP BY LO_SUPPKEY limit 20 offset 10 ``` There is no sort/order and agg expression cause, we don't need to use the `GroupedTopKAggregateStream` struct to get the result. The `GroupedTopKAggregateStream` is not efficient for the SQL. The `GroupedTopKAggregateStream` will consume all of the data and use the `PriorityQueue` to store and sort all data -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org