andygrove opened a new pull request, #4015:
URL: https://github.com/apache/datafusion-comet/pull/4015

   ## Which issue does this PR close?
   
   Closes #1389.
   
   Part of #1267.
   
   ## Rationale for this change
   
   When a Spark query's Final-mode aggregate cannot be converted to Comet (for 
example because its result expressions are not supported, as in 
`concat(flatten(collect_set(col)))`), Comet would still convert the upstream 
Partial-mode aggregate. The Partial produces intermediate buffers in a format 
the Spark Final cannot interpret (different encodings for `CollectSet`, 
`Average`, decimal `Sum`, variance, etc.), which crashes at runtime with errors 
such as `Not supported on CometListVector`.
   
   Conversely, most aggregates block even a safe Spark-Partial + Comet-Final 
combination, where the buffer formats are in fact compatible (`MIN`, `MAX`, 
`COUNT`, bitwise).
   
   This change prevents the crash for unsafe aggregates and unlocks the mixed 
execution for the safe ones.
   
   ## What changes are included in this PR?
   
   - New `supportsMixedPartialFinal` flag on `CometAggregateExpressionSerde`, 
defaulting to `false`. Set to `true` for `MIN`, `MAX`, `COUNT`, `BitAndAgg`, 
`BitOrAgg`, `BitXorAgg`, whose intermediate buffer formats match between Spark 
and Comet.
   - `QueryPlanSerde.allAggsSupportMixedExecution` checks the flag across an 
aggregate's expressions.
   - `CometExecRule.tagUnsafePartialAggregates` runs before bottom-up 
transformation. For each Final-mode aggregate whose expressions are not all 
mixed-safe, it conservatively checks whether the Final itself is convertible 
via the new `canFinalAggregateBeConverted` (mirrors the predicates in 
`CometBaseAggregate.doConvert`). If not, the corresponding Partial (looked up 
by `findPartialAggInPlan`, traversing through `AQEShuffleReadExec` and 
`ShuffleQueryStageExec`) is tagged with `COMET_UNSAFE_PARTIAL`.
   - `CometBaseAggregate.doConvert` honours the new tag, and now permits the 
Spark-Partial + Comet-Final case when all aggregates are mixed-safe.
   
   ## How are these changes tested?
   
   `CometExecRuleSuite`:
   - Existing test for Comet-Partial + Spark-Final with `SUM` (unsafe) is 
un-ignored; asserts neither side is converted.
   - New test for Spark-Partial + Comet-Final with `SUM`; asserts neither side 
is converted.
   - New test for Comet-Partial + Spark-Final with `MIN`/`MAX`/`COUNT`; asserts 
partial converts to Comet, final stays Spark.
   - New test for Spark-Partial + Comet-Final with `MIN`/`MAX`/`COUNT`; asserts 
partial stays Spark, final converts to Comet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to