houqp commented on pull request #1010: URL: https://github.com/apache/arrow-datafusion/pull/1010#issuecomment-926330270
To @yjshen 's point of supporting mixed file formats, it's totally a valid use-case and a widely adopted practice. But I agree with @alamb that we don't need to make the list table do everything now. We can keep it simple to only support partitioned table with single file format. For these more complex mixed file format table formats, they typically have their own specific file organization and schema management logics, so it would be hard to come up with one table provider to capture them all. For example, Hudi uses avro and parquet, DeltaLake uses parquet and json, etc. The better approach in my mind is to create table format specific providers for each of these specialized table formats as plugins. I also think there is value in grouping logically coupled changes into a single commit so it's easier to do git diff. I care less about whether git is smart enough to keep track of the file rename, but at least being able to do a git blame and find all logical related changes to this single commit helps a lot. I totally understand how painful it is to do a large PR because of the needs of having to constantly manage merge conflicts, so I am happy to help if there is anything I can do on my end to alleviate your pain. Just let me know ;) I think it's a good tradeoff do a bit of extra work now for the peace of mind in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
