alamb opened a new issue, #9370:
URL: https://github.com/apache/arrow-datafusion/issues/9370

   ### Is your feature request related to a problem or challenge?
   
   The use of multiple `RepartitionExec` and `CoalesceBatchesExec` I think 
makes the explain plans in DataFusion hard to read. This causes users of 
DataFusion, especially new users, to ask about / wonder if they really need 
this and what it is doing (see discord thread, for example)
   
   For example, consider this plan that is repartitioning the input to a 
`HashJoin` but that repartitioning requires three separate nodes
   
   ```
   ProjectionExec: expr=[name@1 as schoolname, name@3 as teachername]
     CoalesceBatchesExec: target_batch_size=8192
       HashJoinExec: mode=Partitioned, join_type=Inner, on=[(id@0, class_id@0)]
         CoalesceBatchesExec: target_batch_size=8192
           RepartitionExec: partitioning=Hash([id@0], 8), input_partitions=8
             RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1
               VirtualExecutionPlan
         CoalesceBatchesExec: target_batch_size=8192
           RepartitionExec: partitioning=Hash([class_id@0], 8), 
input_partitions=8
             RepartitionExec: partitioning=RoundRobinBatch(8), 
input_partitions=1
               ProjectionExec: expr=[class_id@1 as class_id, name@2 as name]
                 VirtualExecutionPlan
   ```
   
   ### Describe the solution you'd like
   
   Ideally I think the plan would look like this:
   
   ```
   ProjectionExec: expr=[name@1 as schoolname, name@3 as teachername]
     CoalesceBatchesExec: target_batch_size=8192
       HashJoinExec: mode=Partitioned, join_type=Inner, on=[(id@0, class_id@0)]
         RepartitionExec: partitioning=Hash([id@0], 8), input_partitions=8 <-- 
repartition
           VirtualExecutionPlan
         RepartitionExec: partitioning=Hash([class_id@0], 8), input_partitions=8
           ProjectionExec: expr=[class_id@1 as class_id, name@2 as name]
             VirtualExecutionPlan
   ```
   
   ### Describe alternatives you've considered
   
   I think we could do this in at least two steps:
   1. Combine the CoalesceBatchesExec *into*  
   
   I think care needs to be taken to ensure that 
   
   ### Additional context
   
   This came from a discussion in discord: 
https://discord.com/channels/885562378132000778/1206315256977035394/1212085214168490015


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to