alamb opened a new issue, #9370:
URL: https://github.com/apache/arrow-datafusion/issues/9370
### Is your feature request related to a problem or challenge?
The use of multiple `RepartitionExec` and `CoalesceBatchesExec` I think
makes the explain plans in DataFusion hard to read. This causes users of
DataFusion, especially new users, to ask about / wonder if they really need
this and what it is doing (see discord thread, for example)
For example, consider this plan that is repartitioning the input to a
`HashJoin` but that repartitioning requires three separate nodes
```
ProjectionExec: expr=[name@1 as schoolname, name@3 as teachername]
CoalesceBatchesExec: target_batch_size=8192
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(id@0, class_id@0)]
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([id@0], 8), input_partitions=8
RepartitionExec: partitioning=RoundRobinBatch(8),
input_partitions=1
VirtualExecutionPlan
CoalesceBatchesExec: target_batch_size=8192
RepartitionExec: partitioning=Hash([class_id@0], 8),
input_partitions=8
RepartitionExec: partitioning=RoundRobinBatch(8),
input_partitions=1
ProjectionExec: expr=[class_id@1 as class_id, name@2 as name]
VirtualExecutionPlan
```
### Describe the solution you'd like
Ideally I think the plan would look like this:
```
ProjectionExec: expr=[name@1 as schoolname, name@3 as teachername]
CoalesceBatchesExec: target_batch_size=8192
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(id@0, class_id@0)]
RepartitionExec: partitioning=Hash([id@0], 8), input_partitions=8 <--
repartition
VirtualExecutionPlan
RepartitionExec: partitioning=Hash([class_id@0], 8), input_partitions=8
ProjectionExec: expr=[class_id@1 as class_id, name@2 as name]
VirtualExecutionPlan
```
### Describe alternatives you've considered
I think we could do this in at least two steps:
1. Combine the CoalesceBatchesExec *into*
I think care needs to be taken to ensure that
### Additional context
This came from a discussion in discord:
https://discord.com/channels/885562378132000778/1206315256977035394/1212085214168490015
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]