alamb opened a new issue, #4177:
URL: https://github.com/apache/arrow-datafusion/issues/4177

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Suggested by @crepererum  in 
https://github.com/apache/arrow-datafusion/issues/4169#issuecomment-1311347788
   
   Some systems such as IOx, store parquet files in a particular sorted order, 
and then uses the fact the data is sorted for a variety of sort related 
optimizations. 
   
   The `BasicEnforcement` rule added in 
https://github.com/apache/arrow-datafusion/pull/4122 by @mingmwang allows 
DataFusion to take advantage of known information about the sort order. 
   
   One contrived example is if your parquet file is sorted by `price` and your 
query is `select * from data order by price limit 10` datafusion can avoid 
scanning the entire file
   
   Another more interesting example could be using sorted order to reorder 
pushdown filters or using a sort-merge-join without actually sorting
   
   
   **Describe the solution you'd like**
   - [ ] https://github.com/apache/arrow-rs/issues/3090
   - [ ] Detect and use this sorted information when creating a ListingTable 
that reads from parquet files
   
   **Describe alternatives you've considered**
   Don't do it
   
   **Additional context**
   
   Here is a ticket that tracks allowing users of DataFusion to manually 
specify the sort order: https://github.com/apache/arrow-datafusion/issues/4169
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to