Rachelint commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2370165371
> > I am sure that the performance improved now. > > But I think we should push this forward after: > > > > * [Sketch for aggregation intermediate results blocked management #11943](https://github.com/apache/datafusion/pull/11943) merged, because this pr made big code changes too, may be clever to avoid conflicts with this > > * Refactor the aggregation codes. I think it is getting unmaintainable... Some issues pointed out this too [Improve aggregation code readability #12335](https://github.com/apache/datafusion/issues/12335) > > I have some thoughts about the refactor [Improve aggregation code readability #12335 (comment)](https://github.com/apache/datafusion/issues/12335#issuecomment-2358310830) > > @Rachelint I suggest we work on fusing Partial + Repartition first, I'm quite confident on the direction of this improvement. To be honest I don't quite follow #11943 and it seems that we need better test coverage before #11943 be merged. I think we could work on other optimization first (fusing + simplify code #12335) + fuzz test improvement and then review #11943 again after that. Thanks for suggestion! Seems it is actually clever for working `fusing Partial + Repartition` first. Although the #11943 can improve more performance(10%~25%, and fusing `fusing Partial + Repartition` improve 5%~10%) but it indeed needs much larger code changes. But after considering again, I guess it possible that fuzz test and code refactor may should be finished first before continuing to introduce more non-trivial changes? The aggregation codes are actually too complex now, and to be honest I also have no enough confidence to make big changes currently (actually I found possible bugs and regressions in main, and I am checking them again more carefully). I plan to work on them when I have bandwidth after finishing the string related optimizations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
