andygrove opened a new issue, #4242: URL: https://github.com/apache/datafusion-comet/issues/4242
## What is the problem the feature request solves? When #2894 was first addressed by #2994, `COUNT` was included in the set of aggregates safe to run with mixed Spark partial / Comet final execution alongside `MIN`, `MAX`, and the bitwise aggregates. The follow-up branch for that work removed `COUNT` from the safe set after two regressions surfaced. As a result, the TPC-DS coverage gains in #2994 (which were almost entirely driven by `COUNT`) are not realized in the carve-out PR. This issue tracks investigating and re-enabling `COUNT` for mixed Spark partial / Comet final execution. ## Known blockers The two specific regressions that caused `COUNT` to be excluded: 1. **AQE `PropagateEmptyRelationAfterAQE`**. The rule matches `BaseAggregateExec` only, not `CometHashAggregateExec`. When the partial runs in Spark and the final runs in Comet, the rule no longer fires for the final stage, which changes results in some queries. 2. **Spark 4.0 count-bug decorrelation**. The decorrelation rewrite for correlated `IN` subqueries drops a row in the OR pattern in `in-count-bug.sql` when the partial/final aggregate stages are split between Spark and Comet. ## Suggested approach - Reproduce both regressions with `COUNT` re-added to `supportsMixedPartialFinal` in `aggregates.scala`. - For (1), evaluate whether the right fix is upstream (extend `PropagateEmptyRelationAfterAQE` to recognize Comet aggregate exec) or in Comet (e.g., guard mixed-COUNT when AQE is enabled, or insert a wrapper Spark aggregate node). - For (2), determine whether the row-drop is specific to Spark 4.0 decorrelation interacting with mixed aggregate stages, and whether a targeted guard is preferable to broad fallback. - Once both are addressed, re-add `override def supportsMixedPartialFinal: Boolean = true` to `CometCount` and regenerate TPC-DS golden files. ## Related - Parent: #2892 - Original: #2894 (closed by #2994 follow-up) - Adjacent: #1389 (AQE materializing unsupported final HashAggregate) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
