Blajda opened a new issue, #8523:
URL: https://github.com/apache/arrow-datafusion/issues/8523
### Describe the bug
I have an `ExecutionPlan` with the following input distribution requirement.
```
fn required_input_distribution(&self) -> Vec<Distribution> {
vec![Distribution::HashPartitioned(vec![self.expr.clone()]); 1]
}
```
where expr is typical `col("a")`
When the plan is built a RepatriationExec is created with the expected
expression but immediately after a CoalesceBatchesExec is inserted causing
records with the same hash to arrive in different partitions and violating the
required input distribution.
This [optimization rule
](https://github.com/apache/arrow-datafusion/blob/d091b55be6a4ce552023ef162b5d081136d3ff6d/datafusion/core/src/physical_optimizer/coalesce_batches.rs#L65)
is likely the culprit.
Setting `with_coalesce_batches(false);` is a workaround for this
### To Reproduce
_No response_
### Expected behavior
If an node requires an specific input distribution then the optimizer would
respect that requirement.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]