[GitHub] [arrow-datafusion] yjshen opened a new issue #944: Table Scan Enhancement Plan

GitBox Wed, 25 Aug 2021 08:47:38 -0700


yjshen opened a new issue #944:
URL: https://github.com/apache/arrow-datafusion/issues/944



   To make DataFusion scan tables more flexibly and more efficiently, we can 
take several further steps. (I linked several issues I am aware of)
   
   ### Capability: 
   - [ ] Enable listing/reading remote storage systems in an async way #616 
   - [ ] Enable file block granularity processing  (row groups or offset range 
processing instead of currently per-file bases).
   - [ ] Enable reading partitioned table. i.e., partition columns value 
encoded in the file path #133 
   - [ ] Support table schema evolution. Or in other words, relax the 
requirement that all files in a table are completely consistent in the schema
   
   ### Performance:
   
   - [ ] Parallel table file listing #896 
   - [ ] Scan parquet metadata lazily #871 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen opened a new issue #944: Table Scan Enhancement Plan

Reply via email to