andygrove opened a new pull request, #4374:
URL: https://github.com/apache/datafusion-comet/pull/4374

   ## Which issue does this PR close?
   
   Closes #4242.
   
   ## Rationale for this change
   
   #4015 deliberately excluded `COUNT` from `supportsMixedPartialFinal` after 
two regressions surfaced when enabling it: (1) a row drop in correlated `IN` 
subqueries with COUNT inside an `OR` (the count-bug pattern from Spark's 
`in-count-bug.sql`), and (2) an interaction with AQE's 
`PropagateEmptyRelationAfterAQE`. As a result the TPC-DS coverage gains from 
the original #2994 work are not realized.
   
   Investigation showed that the count-bug row drop is a canonicalization bug 
in `CometBroadcastExchangeExec`, not a buffer-format issue:
   
   - `doCanonicalize` was nullifying `originalPlan`. `originalPlan` carries the 
projection applied during broadcast (e.g. `count(1) + 1` vs `count(1) - 1` in 
count-bug decorrelation).
   - With `COUNT` not mixed-safe, the Final aggregate stays on Spark and the 
projection lives inside `child`, so `child.canonicalized` distinguishes the two 
subqueries' broadcasts.
   - With `COUNT` mixed-safe, Comet absorbs the Final and the projection 
migrates to `originalPlan`. The previous canonical form discarded that field, 
so two structurally distinct count subqueries canonicalized identically, and 
AQE's `stageCache` keyed on `canonicalized` merged them via `ReusedExchange`. 
The second subquery then read the first subquery's broadcast values, producing 
wrong `IN` matches and dropping rows.
   
   ## What changes are included in this PR?
   
   - `CometBroadcastExchangeExec.doCanonicalize` now includes 
`originalPlan.canonicalized` so broadcasts with different embedded projections 
stay distinct.
   - `CometCount` re-adds `override def supportsMixedPartialFinal: Boolean = 
true`.
   - Two new regression tests in `CometAggregateSuite`: the `in-count-bug.sql` 
OR pattern and an AQE empty-relation pattern with mixed COUNT.
   - Regenerated TPC-DS plan-stability golden files (34 modified, 1 pruned via 
the fallback chain) reflecting the additional queries where COUNT now runs 
natively.
   
   ## How are these changes tested?
   
   - The new tests in `CometAggregateSuite` fail without the canonicalization 
fix (count-bug row drop reproduces with the documented data divergence) and 
pass with the fix.
   - `CometAggregateSuite` (83 tests) and `CometTPCDSV1_4_PlanStabilitySuite` 
on Spark 3.5 (97 tests) both pass after regeneration.
   - The wider AQE `PropagateEmptyRelationAfterAQE` regression is exercised by 
Spark's own `CachedTableSuite` via `dev/diffs`; CI will surface any remaining 
interaction there.
   
   Stacked on #4015 (will rebase/retarget once it lands).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to