andygrove opened a new pull request, #3879:
URL: https://github.com/apache/datafusion-comet/pull/3879

   **[EXPERIMENTAL]**
   
   ## Which issue does this PR close?
   
   Closes #3874.
   
   ## Rationale for this change
   
   When a scan uses Dynamic Partition Pruning (DPP) and falls back to Spark, 
Comet was still wrapping the stage with columnar shuffle, creating an 
inefficient plan with multiple row-to-columnar transitions:
   
   ```
   CometShuffleWriter
     CometRowToColumnar
       SparkFilter
         SparkColumnarToRow
           SparkScan
   ```
   
   This was causing orders of magnitude slowdowns in TPC-DS queries that use 
DPP on fact table scans.
   
   ## What changes are included in this PR?
   
   Adds a DPP check in `columnarShuffleSupported()` in 
`CometShuffleExchangeExec`. When `spark.comet.dppFallback.enabled=true` (the 
default), the method now walks the child plan tree to detect 
`FileSourceScanExec` nodes with dynamic pruning filters. If found, it returns 
`false`, preventing the shuffle exchange from being converted to Comet and 
allowing the entire stage to fall back to Spark.
   
   ## How are these changes tested?
   
   New test `DPP fallback avoids inefficient Comet shuffle (#3874)` that forces 
a sort-merge join with DPP and verifies no `CometColumnarShuffle` appears in 
the plan. Existing `DPP fallback` test continues to pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to