[GitHub] [arrow-datafusion] rdettai commented on pull request #1010: Reorganize table providers by table format

GitBox Thu, 16 Sep 2021 09:05:30 -0700


rdettai commented on pull request #1010:
URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-921034340



   > I thought, however, we were headed towards a slightly different 
abstraction where would still have a ParquetReader that didn't use Path / File 
directly, but instead would use the ObjectStore abstraction recently added by 
@yjshen.
   
   Correct, that will be the next step
   
   > TLDR: I wonder "if DataFusion planning was async would you be able to 
implement the table format as you would like"?
   
   Yes, that would really bring a huge amount of flexibility. A funny example: 
I have just added a sketch implementation of the partition pruning algorithm. 
One interesting approach is to load the partitions into a `RecordBatch` to be 
able to run the pushed down filter on it. DataFusion inside Datafusion! But we 
are stuck because that requires `async`. Too many APIs are async in the rust 
ecosystem, we want to be able to use them in the planning 😄 
   
   I am going to try to make the `TableProvider.scan()` method async, and if it 
works I'll submit that in a separate PR.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai commented on pull request #1010: Reorganize table providers by table format

Reply via email to