mbutrovich opened a new issue, #3510:
URL: https://github.com/apache/datafusion-comet/issues/3510

   ### What is the problem the feature request solves?
   
   I opened #3446 initially to apply a bunch of changes from #3349 to 
CometNativeScan. The PR got away from me though, particularly due to DPP. I am 
opening this issue to track the things I'd like to tackle in smaller chunks 
rather than one giant PR:
   - [ ] Per-partition serde to no longer send every `SparkFilePartition` of 
tasks to every partition. This simply reduces serde overhead for large scans.
   - [ ] DPP (non-AQE) for V1 operator. These runtime filters are created by 
Spark's `PlanDynamicPruningFilters` and are easier for Comet support since this 
rule runs before Comet's rules.
   - [ ] DPP (AQE) for V1 operator. These runtime filters are created by 
Spark's `PlanAdaptiveDynamicPruningFilters` and are difficult for Comet support 
since this rule runs after Comet's rules. I'll summarize my learning from #3446:
     - Comet's rules replace things like `BroadcastHashJoin` with 
`CometBroadcastHashJoin`, which `PlanAdaptiveDynamicPruningFilters` does not 
recognize.
     - We can't modify Spark rules, so we could wait until after 
`PlanAdaptiveDynamicPruningFilters` runs. This requires registering new Comet 
rules after where they currently run. I tried to create a simple rule to defer 
just `BroadcastHashJoin` replacement until later, but this became too 
complicated with multiple scan implementations. I think when we pare down our 
scan implementations, we can revisit a broader redesign of Comet rules in a way 
that works better with AQE. We will need this for stronger Spark 4.0 support.
   - [ ] CometNativeBatchScan operator. See #3481.
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to