andygrove opened a new pull request, #4374: URL: https://github.com/apache/datafusion-comet/pull/4374
## Which issue does this PR close? Closes #4242. ## Rationale for this change #4015 deliberately excluded `COUNT` from `supportsMixedPartialFinal` after two regressions surfaced when enabling it: (1) a row drop in correlated `IN` subqueries with COUNT inside an `OR` (the count-bug pattern from Spark's `in-count-bug.sql`), and (2) an interaction with AQE's `PropagateEmptyRelationAfterAQE`. As a result the TPC-DS coverage gains from the original #2994 work are not realized. Investigation showed that the count-bug row drop is a canonicalization bug in `CometBroadcastExchangeExec`, not a buffer-format issue: - `doCanonicalize` was nullifying `originalPlan`. `originalPlan` carries the projection applied during broadcast (e.g. `count(1) + 1` vs `count(1) - 1` in count-bug decorrelation). - With `COUNT` not mixed-safe, the Final aggregate stays on Spark and the projection lives inside `child`, so `child.canonicalized` distinguishes the two subqueries' broadcasts. - With `COUNT` mixed-safe, Comet absorbs the Final and the projection migrates to `originalPlan`. The previous canonical form discarded that field, so two structurally distinct count subqueries canonicalized identically, and AQE's `stageCache` keyed on `canonicalized` merged them via `ReusedExchange`. The second subquery then read the first subquery's broadcast values, producing wrong `IN` matches and dropping rows. ## What changes are included in this PR? - `CometBroadcastExchangeExec.doCanonicalize` now includes `originalPlan.canonicalized` so broadcasts with different embedded projections stay distinct. - `CometCount` re-adds `override def supportsMixedPartialFinal: Boolean = true`. - Two new regression tests in `CometAggregateSuite`: the `in-count-bug.sql` OR pattern and an AQE empty-relation pattern with mixed COUNT. - Regenerated TPC-DS plan-stability golden files (34 modified, 1 pruned via the fallback chain) reflecting the additional queries where COUNT now runs natively. ## How are these changes tested? - The new tests in `CometAggregateSuite` fail without the canonicalization fix (count-bug row drop reproduces with the documented data divergence) and pass with the fix. - `CometAggregateSuite` (83 tests) and `CometTPCDSV1_4_PlanStabilitySuite` on Spark 3.5 (97 tests) both pass after regeneration. - The wider AQE `PropagateEmptyRelationAfterAQE` regression is exercised by Spark's own `CachedTableSuite` via `dev/diffs`; CI will surface any remaining interaction there. Stacked on #4015 (will rebase/retarget once it lands). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
