alamb opened a new issue, #4883: URL: https://github.com/apache/arrow-datafusion/issues/4883
**Describe the bug** I am seeing a Repartition being added incorrectly in some cases in our IOx plans (which then causes resorts to happen, which is a huge deal for us) **Expected behavior** If the data is sorted it should not be resorted **Additional context** The repartition is being added by the https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_optimizer/repartition.rs physical optimizer pass I previously fixed a similar issue like this in https://github.com/apache/arrow-datafusion/pull/1776 but DataFusion got more sophisticated recently (in a good way). I believe that somehow it doesn't realize that the output of the UnionExec is sorted and thus should not be repartitioned **To Reproduce** My suspicion is that https://github.com/apache/arrow-datafusion/pull/4714 / https://github.com/apache/arrow-datafusion/commit/899c86a0c62f0c3f6324d3125158ec643b524515 is the specific code that is causing this change (as the relies_on_input_order, which I added explicitly for this case, is now ignored -- see https://github.com/apache/arrow-datafusion/pull/4856). I think the fix is to update DataFusion to be smarter about knowing how UnionExec is sorted in In fact, looking at the tests I wrote and were changed in https://github.com/apache/arrow-datafusion/pull/4714 https://github.com/apache/arrow-datafusion/blob/556282a8b6da6cb7d41d8c311211ae49b7ed82a7/datafusion/core/src/physical_optimizer/repartition.rs#L574-L586 You can see exactly that the Repartition and sort have been added Found while updating DataFusion in IOx: https://github.com/influxdata/influxdb_iox/pull/6483#discussion_r1065433368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
