alamb commented on issue #7871:
URL:
https://github.com/apache/arrow-datafusion/issues/7871#issuecomment-1832767485
> We could pass sort exprs to scan, then users can construct ExecutionPlan
based on sort exprs.
I think the challenge is that DataFusion currently treats the sort order
from an `ExecutionPlan` like "if it has a sort order, I will try and use it"
rather than "I will try and push Sort into the scan"
Instead, DataFusion will introduce `SortExec` to resort the data if that is
necessary to answer the query.
In order to "push" sorts into ExecutionPlans / scans, we would need some way
to help DataFusion figure out if it should push the sort into the scan, or use
a Sort Exec afterwards
For example, it is not clear which of the following plans is better as it
depends on how the Sort within ExecutionPlan was implemented
```
SortExec
Filter
Scan (no sort)
```
vs
```
Filter
Scan (Sort in the Scan)
```
Depending on how selective the filter, it may be better to do the scan /
filter and then sort.
of course in this case the filter is likely pushed down to the scan too, but
I think in general the same issue still applies
For this usecase, I suggest adding a custom optimizer pass that does the
sort pushdown you want and can take advantage of the details of what the
underlying source is to make these choices
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]