alamb commented on issue #7871:
URL: 
https://github.com/apache/arrow-datafusion/issues/7871#issuecomment-1773195075

   Thank you for bringing this up @backkem 
   
   # Sort Pushdown?
   
   When you say "sort pushdown" does that mean providing a sort order to 
[`TableProvider::scan`](https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html#tymethod.scan)?
 
   
   Is the idea that the table providers have some faster way to sort than what 
is built into DataFusion?
   
   Most of the sort based optimizations are done after `TableProvider::scan` 
has been called, so I am not sure how useful a pushed down sort would be
   
   # Existing ordering
   
   There is a similar idea (maybe it is what you mean by sort order push down) 
where sources can tell DataFusion about any pre existing sort orders the data 
may have (e.g. because a parquet file was written with some sort order)
   
   The 
[TableProvider](https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html#)
 trait does not have any way to communicate this information directly as it is 
not used until physcal planning, but the `ExecutionPlan` returned by 
`TableProvider::scan` does allow setting this via  
[`ExecutionPlan::output_ordering`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html#tymethod.output_ordering)
   
   
   You can see an example of how this is hooked up in the built-in  
[`ListingTable`](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html)
 TableProvider (via 
[ListingTableConfig](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTableConfig.html)
 via 
[ListingOptions](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingOptions.html)
 via 
[`file_sort_order`](https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingOptions.html#structfield.file_sort_order)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to