alamb commented on issue #7871: URL: https://github.com/apache/arrow-datafusion/issues/7871#issuecomment-1773195075
Thank you for bringing this up @backkem # Sort Pushdown? When you say "sort pushdown" does that mean providing a sort order to [`TableProvider::scan`](https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html#tymethod.scan)? Is the idea that the table providers have some faster way to sort than what is built into DataFusion? Most of the sort based optimizations are done after `TableProvider::scan` has been called, so I am not sure how useful a pushed down sort would be # Existing ordering There is a similar idea (maybe it is what you mean by sort order push down) where sources can tell DataFusion about any pre existing sort orders the data may have (e.g. because a parquet file was written with some sort order) The [TableProvider](https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html#) trait does not have any way to communicate this information directly as it is not used until physcal planning, but the `ExecutionPlan` returned by `TableProvider::scan` does allow setting this via [`ExecutionPlan::output_ordering`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html#tymethod.output_ordering) You can see an example of how this is hooked up in the built-in [`ListingTable`](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html) TableProvider (via [ListingTableConfig](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTableConfig.html) via [ListingOptions](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingOptions.html) via [`file_sort_order`](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingOptions.html#structfield.file_sort_order) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
