Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2910945348
> > My biggest concern is if we can get more obvious improvement to make the change worthy... > > As I understood the way to get a bigger improvement would be to implement the chunked approach for more group storage / aggregates so that more queries in our benchmarks (like ClickBench) could use the new code path > > Though of course that would make this PR even bigger > > We could also make a "POC" type PR with some more implementation to prove out the performance and then break it into smaller pieces for review 🤔 Yes, I am trying to implement it for `count`, and see if some improvements in `q4` and `q15`. But according to the flamegraph, I found all such queries' bottleneck is actually `hashtable`. I think performance improvement from this pr will be not obvious before we overcome `hashtable`(I am experimenting about clickhouse like hashtable). I think the benefit from this one currently is: - Part of the better aggregation spilling (I will file a issue to discuss it maybe tonight) - Lower memory usage due to faster freeing batch by batch - Can make really high cardinality query around 1.1 faster Do you think it still worthy continuing to push forward for above benefits? I plan to sort out codes tonight, and we just make a simplest implementation in first stage (now is a bit complex due to some experiment about continue improving performance)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org