alamb commented on issue #5325: URL: https://github.com/apache/arrow-datafusion/issues/5325#issuecomment-1445138838
> This would also eliminate the need for Vec<ScalarValue> to be stored in an accumulator. Spark does this in [RewriteDistinctAggregates](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala). Thanks -- this is a neat idea @yjshen One challenge I have seen with this approach in the past is it will result in a "diamond shaped plan" (where the same input stream is split into two output streams (to the different aggregates) and then brought back together. In general, this approach may required unbounded buffering if using sort based aggregation. But I think would definitely be worth considering -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
