Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

via GitHub Mon, 26 May 2025 19:41:59 -0700


Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2910945348


   > > My biggest concern is if we can get more obvious improvement to make the 
change worthy...
   > 
   > As I understood the way to get a bigger improvement would be to implement 
the chunked approach for more group storage / aggregates so that more queries 
in our benchmarks (like ClickBench) could use the new code path
   > 
   > Though of course that would make this PR even bigger
   > 
   > We could also make a "POC" type PR with some more implementation to prove 
out the performance and then break it into smaller pieces for review 🤔
   
   Yes, I am trying to implement it for `count`, and see if some improvements 
in `q4` and `q15`.
   
   But according to the flamegraph, I found all such queries' bottleneck is 
actually `hashtable`. I think performance improvement from this pr will be not 
obvious before we overcome `hashtable`(I am experimenting about clickhouse like 
hashtable).
   
   I think the benefit from this one currently is:
   - Part of the better aggregation spilling (I will file a issue to discuss it 
maybe tonight)
   - Lower memory usage due to faster freeing batch by batch
   - Can make really high cardinality query around 1.1 faster
   
   Do you think it still worthy continuing to push forward for above benefits?
   I plan to sort out codes tonight, and we just make a simplest implementation 
in first stage (now is a bit complex due to some experiment about continue 
improving performance)?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

Reply via email to