alamb commented on issue #7077:
URL: 
https://github.com/apache/arrow-datafusion/issues/7077#issuecomment-1653420751

   > So I am left with wanting to push a SortExec node through RepartionExecs 
but only some of the time. I think the right heuristic is: push down SortExec 
if it would result in needing to sort fewer tuples overall. This is a new bit 
of analysis that I think pushdown_sorts will need to perform.
   
   This sounds very much like a cost (estimate) based optimization. DataFusion 
has very few of these at the moment, and I think in general they cause 
confusion in other systems because what they will do is hard to predict (and 
relies on data properties, and thus hard to reproduce as well). 
   
   In addition, estimating cardinalities is a known hard problem (especially 
with correlated predicates, data skew, joins, etc) 
   
   For the IOx specific case, here are some possible alternatives:
   1. A `ConfigOptions` setting to control if the sort is pushed through 
repartition 
   2. A special case optimizer in IOx that pushes the sorts through unions / 
partitions
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to