yjshen opened a new issue #944:
URL: https://github.com/apache/arrow-datafusion/issues/944


   To make DataFusion scan tables more flexibly and more efficiently, we can 
take several further steps. (I linked several issues I am aware of)
   
   ### Capability: 
   - [ ] Enable listing/reading remote storage systems in an async way #616 
   - [ ] Enable file block granularity processing  (row groups or offset range 
processing instead of currently per-file bases).
   - [ ] Enable reading partitioned table. i.e., partition columns value 
encoded in the file path #133 
   - [ ] Support table schema evolution. Or in other words, relax the 
requirement that all files in a table are completely consistent in the schema
   
   ### Performance:
   
   - [ ] Parallel table file listing #896 
   - [ ] Scan parquet metadata lazily #871 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to