wolffcm commented on issue #7077:
URL: 
https://github.com/apache/arrow-datafusion/issues/7077#issuecomment-1652746677

   Looking into this more I can see that `pushdown_sorts` will not push a 
`SortExec` through a `RepartitionExec` node. This makes sense since in general 
this means that the sort will be performed on parallel streams of data, which 
is good.
   
   The problem is that in some cases (the case that I care about) we can 
further push down the sort through `UnionExec` node then end up not needing to 
sort in one or more inputs to the union which is advantageous. Incidentally, if 
no `SortExec` nodes were needed at all, such as if both sides of the 
`UnionExec` were sorted, I think that `replace_with_order_preserving_variants` 
would catch this case.
   
   So I am left with wanting to push a `SortExec` node through `RepartionExec`s 
but only some of the time. I think the right heuristic is: push down `SortExec` 
if it would result in needing to sort fewer tuples overall. This is a new bit 
of analysis that I think `pushdown_sorts` will need to perform.
   
   @alamb @mustafasrepo @ozankabak Do you all think that this approach makes 
sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to