Rachelint commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2331929376
> > I guess the reason why performance improved may be simlar as the partial skipping > > Yes, it is why I experiment with single mode, force to avoid partial and repartition stage for all query, sadly, this doesn't work well for low cardinality case > > > We would probably need to consolidate Aggregate(Partial and Final) and Repartition into a single place in order to be able to adaptively choose aggregate mode/algorithm based on runtime statistics. > > I agree, similar to my idea before. > > > Alternative idea for improvement is, if we can combine partial group + repartition + final group in one operation. We could probably avoid converting to row once again in final group. > > However, the refactor is quite challenging For aggr, It may be used to perform the parallel merging in final aggr from partial aggr. In my knowledge, duck seems use partitioned hashtable to perform the similar mechanism? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
