[GitHub] [arrow-datafusion] alamb commented on issue #5325: Optimize Accumulator `size` function performance (fix regression on clickbench)

via GitHub Sat, 25 Feb 2023 07:08:31 -0800


alamb commented on issue #5325:
URL: 
https://github.com/apache/arrow-datafusion/issues/5325#issuecomment-1445138838


   > This would also eliminate the need for Vec<ScalarValue> to be stored in an 
accumulator. Spark does this in 
[RewriteDistinctAggregates](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala).
   
   Thanks -- this is a neat idea @yjshen 
   
   One challenge I have seen with this approach in the past is it will result 
in a "diamond shaped plan" (where the same input stream is split into two 
output streams (to the different aggregates) and then brought back together. In 
general, this approach may required unbounded buffering if using sort based 
aggregation.
   
   But I think would definitely be worth considering


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #5325: Optimize Accumulator `size` function performance (fix regression on clickbench)

Reply via email to