andygrove opened a new pull request, #3981:
URL: https://github.com/apache/datafusion-comet/pull/3981

   ## Which issue does this PR close?
   
   Related to #3949.
   
   ## Rationale for this change
   
   \`stageContainsDPPScan\` in \`CometShuffleExchangeExec\` (introduced by 
#3879) uses a plain \`s.child.exists(...)\` walk to decide whether the shuffle 
subtree contains a DPP scan. Under AQE, once a child stage materializes, its 
subtree is replaced by a \`ShuffleQueryStageExec\` — a \`LeafExecNode\` whose 
\`children\` is \`Seq.empty\`. \`.exists\` cannot descend through it, so the 
DPP scan becomes invisible on the stage-prep pass.
   
   The consequence: the same shuffle that correctly fell back to Spark at 
initial planning gets converted to Comet the second time the rule runs, because 
\`stageContainsDPPScan\` returns \`false\` once the inner stage has 
materialized. That flip produces plan-shape inconsistencies across the two 
passes — the suspected mechanism behind the \`ColumnarToRowExec\` 
canonicalization assertion in #3949.
   
   ## What changes are included in this PR?
   
   - Walk the tree explicitly instead of relying on \`exists\`, and descend 
into \`QueryStageExec.plan\` so both passes see the same subtree.
   
   ## How are these changes tested?
   
   - New \`CometDppFallbackConsistencySuite\` builds a DPP-shaped query, takes 
the real \`ShuffleExchangeExec\` out of the plan, wraps it in a real 
\`ShuffleQueryStageExec\` (the exact wrapper AQE produces when a stage 
materializes), and asserts that \`columnarShuffleSupported\` still falls back 
to Spark. Without this fix the outer wrapping flips the decision to Comet.
   - Existing \`DPP fallback\` and \`DPP fallback avoids inefficient Comet 
shuffle (#3874)\` tests in \`CometExecSuite\` still pass.
   
   I have not been able to reproduce the full #3949 crash even with the flip 
demonstrated, so this PR is positioned as a correctness fix for the 
decision-stability invariant. It may or may not close #3949 on its own.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to