Rachelint commented on issue #11680:
URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2370165371

   > > I am sure that the performance improved now.
   > > But I think we should push this forward after:
   > > 
   > > * [Sketch for aggregation intermediate results blocked management 
#11943](https://github.com/apache/datafusion/pull/11943) merged, because this 
pr made big code changes too, may be clever to  avoid conflicts with this
   > > * Refactor the aggregation codes. I think it is getting 
unmaintainable... Some issues pointed out this too [Improve aggregation code 
readability  #12335](https://github.com/apache/datafusion/issues/12335)
   > >   I have some thoughts about the refactor [Improve aggregation code 
readability  #12335 
(comment)](https://github.com/apache/datafusion/issues/12335#issuecomment-2358310830)
   > 
   > @Rachelint I suggest we work on fusing Partial + Repartition first, I'm 
quite confident on the direction of this improvement. To be honest I don't 
quite follow #11943 and it seems that we need better test coverage before 
#11943 be merged. I think we could work on other optimization first (fusing + 
simplify code #12335) + fuzz test improvement and then review #11943 again 
after that.
   
   Thanks for suggestion! 
   Seems it is actually clever for working `fusing Partial + Repartition` 
first.  Although the #11943 can improve more performance(10%~25%, and fusing 
`fusing Partial + Repartition` improve 5%~10%) but it indeed needs much larger 
code changes.
   
   But after considering again, I guess it possible that fuzz test and code 
refactor may should be finished first before continuing to introduce more 
non-trivial changes? 
   
   The aggregation codes are actually too complex now, and to be honest I also 
have no enough confidence to make big changes currently (actually I found 
possible bugs and regressions in main, and I am checking them again more 
carefully).
   
   I plan to work on them when I have bandwidth after finishing the string 
related optimizations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to