alamb commented on pull request #1010:
URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-934768426


   TLDR: this PR (after including the code in 
https://github.com/rdettai/arrow-datafusion/pull/1) extracts the file format 
and the file manipulation parts of the CSV, Avro, Parquet and Json providers so 
they can be reused and alternate, non file based table providers can be 
implemented more easily.
   
   However, at the moment the PR is entirely additive (adds all new code, 
doesn't remove  the old table providers). 
   
   However, I would like to propose:
   1.  merging https://github.com/rdettai/arrow-datafusion/pull/1  into this PR
   2. merging this pr into datafusion/master
   3. Do follow on PR(s) refactor the old table providers (e.g.`ParquetExec` et 
al) to use the ne code
   
   cc @Dandandan, @houqp , @yjshen, @Jimexist  
   
   There is a risk of having the both sets of datasource implementations live 
on master for some time, but I think it would be better than an even larger PR.
   
   Given @rdettai 's track record and the amount of PR review @houqp @yjshen 
and I have done already, I think the risk that we end up in a split brain state 
for an extended period of time is minimal
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to