[GitHub] [arrow-datafusion] houqp commented on pull request #1010: Reorganize table providers by table format

GitBox Thu, 23 Sep 2021 21:20:14 -0700


houqp commented on pull request #1010:
URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-926330270



   To @yjshen 's point of supporting mixed file formats, it's totally a valid 
use-case and a widely adopted practice. But I agree with @alamb that we don't 
need to make the list table do everything now. We can keep it simple to only 
support partitioned table with single file format. For these more complex mixed 
file format table formats, they typically have their own specific file 
organization and schema management logics, so it would be hard to come up with 
one table provider to capture them all. For example, Hudi uses avro and 
parquet, DeltaLake uses parquet and json, etc. The better approach in my mind 
is to create table format specific providers for each of these specialized 
table formats as plugins.
   
   I also think there is value in grouping logically coupled changes into a 
single commit so it's easier to do git diff. I care less about whether git is 
smart enough to keep track of the file rename, but at least being able to do a 
git blame and find all logical related changes to this single commit helps a 
lot. I totally understand how painful it is to do a large PR because of the 
needs of having to constantly manage merge conflicts, so I am happy to help if 
there is anything I can do on my end to alleviate your pain. Just let me know 
;) I think it's a good tradeoff do a bit of extra work now for the peace of 
mind in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] houqp commented on pull request #1010: Reorganize table providers by table format

Reply via email to