yjshen opened a new issue #944: URL: https://github.com/apache/arrow-datafusion/issues/944
To make DataFusion scan tables more flexibly and more efficiently, we can take several further steps. (I linked several issues I am aware of) ### Capability: - [ ] Enable listing/reading remote storage systems in an async way #616 - [ ] Enable file block granularity processing (row groups or offset range processing instead of currently per-file bases). - [ ] Enable reading partitioned table. i.e., partition columns value encoded in the file path #133 - [ ] Support table schema evolution. Or in other words, relax the requirement that all files in a table are completely consistent in the schema ### Performance: - [ ] Parallel table file listing #896 - [ ] Scan parquet metadata lazily #871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org