alamb commented on pull request #1010: URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-934768426
TLDR: this PR (after including the code in https://github.com/rdettai/arrow-datafusion/pull/1) extracts the file format and the file manipulation parts of the CSV, Avro, Parquet and Json providers so they can be reused and alternate, non file based table providers can be implemented more easily. However, at the moment the PR is entirely additive (adds all new code, doesn't remove the old table providers). However, I would like to propose: 1. merging https://github.com/rdettai/arrow-datafusion/pull/1 into this PR 2. merging this pr into datafusion/master 3. Do follow on PR(s) refactor the old table providers (e.g.`ParquetExec` et al) to use the ne code cc @Dandandan, @houqp , @yjshen, @Jimexist There is a risk of having the both sets of datasource implementations live on master for some time, but I think it would be better than an even larger PR. Given @rdettai 's track record and the amount of PR review @houqp @yjshen and I have done already, I think the risk that we end up in a split brain state for an extended period of time is minimal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
