[GitHub] [arrow-datafusion] rdettai opened a new issue #1009: Reorganize table providers by table format

GitBox Thu, 16 Sep 2021 01:49:05 -0700


rdettai opened a new issue #1009:
URL: https://github.com/apache/arrow-datafusion/issues/1009



   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Currently the `TableProvider` implementations are split by file format 
(Parquet, CSV...). One other solution would be to organize `TableProvider`s 
would be by table format (file system listing, Iceberg, 
[Delta](https://github.com/delta-io/delta-rs/blob/main/rust/src/delta_datafusion.rs)).
 
   
   **Describe the solution you'd like**
   - `ExecutionPlan` implementations would remain organized by file format. A 
`TableProvider` could create different types of execution plan according to its 
configuration or auto-discovering the data file format from the information 
stored in the table format
   - the current implementations for Parquet, CSV, JSON and Avro would go into 
a `ListingTable` provider. Implicitly the table format implemented currently:
     - is given a directory as input
     - discovers the files using the file system "listing" operation
   - Schema inference, when required, would be resolved outside the 
`TableProvider` and and would be exposed as a service by ballista
   
   **Describe alternatives you've considered**
   An alternative is to leave the table providers organized as is and try to 
solve the table formats at a different moment of the planning. **This is 
discussed in this [design 
document](https://docs.google.com/document/d/1Bd4-PLLH-pHj0BquMDsJ6cVr_awnxTuvwNJuWsTHxAQ/edit?usp=sharing).**
   
   **Additional context**
   - This will help solving #133
   - It helps solving Ballista issues
     - #349
     - #868
     - #871
   - This is related and and complementary to #944
   - This replaces the `TableDescriptor` abstraction added in #932
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai opened a new issue #1009: Reorganize table providers by table format

Reply via email to