andygrove opened a new pull request, #4015: URL: https://github.com/apache/datafusion-comet/pull/4015
## Which issue does this PR close? Closes #1389. Part of #1267. ## Rationale for this change When a Spark query's Final-mode aggregate cannot be converted to Comet (for example because its result expressions are not supported, as in `concat(flatten(collect_set(col)))`), Comet would still convert the upstream Partial-mode aggregate. The Partial produces intermediate buffers in a format the Spark Final cannot interpret (different encodings for `CollectSet`, `Average`, decimal `Sum`, variance, etc.), which crashes at runtime with errors such as `Not supported on CometListVector`. Conversely, most aggregates block even a safe Spark-Partial + Comet-Final combination, where the buffer formats are in fact compatible (`MIN`, `MAX`, `COUNT`, bitwise). This change prevents the crash for unsafe aggregates and unlocks the mixed execution for the safe ones. ## What changes are included in this PR? - New `supportsMixedPartialFinal` flag on `CometAggregateExpressionSerde`, defaulting to `false`. Set to `true` for `MIN`, `MAX`, `COUNT`, `BitAndAgg`, `BitOrAgg`, `BitXorAgg`, whose intermediate buffer formats match between Spark and Comet. - `QueryPlanSerde.allAggsSupportMixedExecution` checks the flag across an aggregate's expressions. - `CometExecRule.tagUnsafePartialAggregates` runs before bottom-up transformation. For each Final-mode aggregate whose expressions are not all mixed-safe, it conservatively checks whether the Final itself is convertible via the new `canFinalAggregateBeConverted` (mirrors the predicates in `CometBaseAggregate.doConvert`). If not, the corresponding Partial (looked up by `findPartialAggInPlan`, traversing through `AQEShuffleReadExec` and `ShuffleQueryStageExec`) is tagged with `COMET_UNSAFE_PARTIAL`. - `CometBaseAggregate.doConvert` honours the new tag, and now permits the Spark-Partial + Comet-Final case when all aggregates are mixed-safe. ## How are these changes tested? `CometExecRuleSuite`: - Existing test for Comet-Partial + Spark-Final with `SUM` (unsafe) is un-ignored; asserts neither side is converted. - New test for Spark-Partial + Comet-Final with `SUM`; asserts neither side is converted. - New test for Comet-Partial + Spark-Final with `MIN`/`MAX`/`COUNT`; asserts partial converts to Comet, final stays Spark. - New test for Spark-Partial + Comet-Final with `MIN`/`MAX`/`COUNT`; asserts partial stays Spark, final converts to Comet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
