[I] Enable Comet aggregation (partition + final) as a whole [arrow-datafusion-comet]

via GitHub Thu, 21 Mar 2024 10:36:10 -0700


viirya opened a new issue, #223:
URL: https://github.com/apache/arrow-datafusion-comet/issues/223


   ### What is the problem the feature request solves?
   
   Currently we treat partial and final aggregation operators separately during 
Comet planner. So theoretically you could get a Comet partial aggregation + 
Spark final aggregation.
   
   The issue of this combination is that some aggregation functions in 
DataFusion may use unsigned integer types which cannot be properly mapped to 
Spark data type (e.g., Uint64 -> LongType). If we have a Comet partial 
aggregation + Spark final aggregation, it is possibly overflowing in runtime.
   
   Actually I think only partial aggregation in Comet doesn't help too much. 
Because it means Comet shuffle is not enabled. Only partial aggregation 
directly on top of a Comet Scan will be transformed to Comet partial 
aggregation in such cases. I think it is very limited.
   
   I think we can treat partial + final aggregation as a whole and 
enable/disalbe Comet aggregation (partition + final) together.
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Enable Comet aggregation (partition + final) as a whole [arrow-datafusion-comet]

Reply via email to