alamb commented on issue #7077: URL: https://github.com/apache/arrow-datafusion/issues/7077#issuecomment-1653420751
> So I am left with wanting to push a SortExec node through RepartionExecs but only some of the time. I think the right heuristic is: push down SortExec if it would result in needing to sort fewer tuples overall. This is a new bit of analysis that I think pushdown_sorts will need to perform. This sounds very much like a cost (estimate) based optimization. DataFusion has very few of these at the moment, and I think in general they cause confusion in other systems because what they will do is hard to predict (and relies on data properties, and thus hard to reproduce as well). In addition, estimating cardinalities is a known hard problem (especially with correlated predicates, data skew, joins, etc) For the IOx specific case, here are some possible alternatives: 1. A `ConfigOptions` setting to control if the sort is pushed through repartition 2. A special case optimizer in IOx that pushes the sorts through unions / partitions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
