rdblue commented on pull request #1972:
URL: https://github.com/apache/iceberg/pull/1972#issuecomment-762481434


   @Fokko, sorry for the delay getting back to this. I've been trying to get 
everything reviewed for the 0.11.0 release.
   
   I'm excited to see Beam support working, but I don't think that this PR has 
enough guardrails for general use. Because users create their own Avro files, 
it would be easy to create files without field IDs and append them to a table. 
Doing that also requires an extra step to read the Avro file if I'm reading 
`FilenameToDataFile` correctly. We usually want engines to handle the details 
of writing data correctly so users don't need to understand them, but this PR 
delegates the problem to users.
   
   I think we would need to provide more support so that users don't write bad 
data into their tables, which would also fix the performance problem introduced 
by `FilenameToDataFile`. There is already support in Iceberg for writing Avro 
generics to both Avro and Parquet formats. Could this add file writers using 
that to ensure that file stats and schema are correct? That would easily add 
Parquet support in the process.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to