alamb commented on issue #19216: URL: https://github.com/apache/datafusion/issues/19216#issuecomment-3634508546
> Can you please share your views / suggestions on the same ? I am not sure what you are asking about > What we noticed is that topK doesn't spill and hence all clickbench groupBy queries with OrderBy + Limit even with single target partition such as > SELECT "SearchPhrase", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY u DESC LIMIT 10; Since this has a LIMIT 10, the topK only needs to hold the top 10 values Therefore, I don't think the issue is related to TopK itself, but rather than memory usage of one of the GroupbyHash aggregate operations that feeds the top k (in this case there are very many distinct "SearchPhrase" values) The GroupByHash operator should also be able to spill when under memory pressure, so I don't know why it is failing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
