alamb opened a new issue, #4883:
URL: https://github.com/apache/arrow-datafusion/issues/4883

   **Describe the bug**
   I am seeing a  Repartition being added incorrectly in some cases in our IOx 
plans (which then causes resorts to happen, which is a huge deal for us)
   
   
   **Expected behavior**
   If the data is sorted it should not be resorted
   
   **Additional context**
   
   The repartition is being added by the 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/physical_optimizer/repartition.rs
 physical optimizer pass
   I previously fixed a similar issue like this in 
https://github.com/apache/arrow-datafusion/pull/1776 but DataFusion got more 
sophisticated recently (in a good way). I believe that somehow it doesn't 
realize that the output of the UnionExec is sorted and thus should not be 
repartitioned
   
   **To Reproduce**
   My suspicion is that https://github.com/apache/arrow-datafusion/pull/4714 / 
https://github.com/apache/arrow-datafusion/commit/899c86a0c62f0c3f6324d3125158ec643b524515
 is the specific code that is causing this change (as the 
relies_on_input_order, which I added explicitly for this case, is now ignored 
-- see https://github.com/apache/arrow-datafusion/pull/4856).
   
   I think the fix is to update DataFusion to be smarter about knowing how 
UnionExec is sorted in
   
   In fact, looking at the tests I wrote and were changed in 
https://github.com/apache/arrow-datafusion/pull/4714
   
   
https://github.com/apache/arrow-datafusion/blob/556282a8b6da6cb7d41d8c311211ae49b7ed82a7/datafusion/core/src/physical_optimizer/repartition.rs#L574-L586
   
   You can see exactly that the Repartition and sort have been added
   
   Found while updating DataFusion in IOx: 
https://github.com/influxdata/influxdb_iox/pull/6483#discussion_r1065433368
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to