rdblue commented on pull request #2182: URL: https://github.com/apache/iceberg/pull/2182#issuecomment-772052442
I also think that we need to have a concrete plan for how these files are created and updated. The number of partitions in a table could easily be a few million. That's quite a bit to write each time a snapshot is produced, since these files are immutable. And because these would be maintained by applying the changes from one to the next (rather than scanning all of the metadata) the cost would be to read an entire file and write an entire file. That's quite a bit of extra work that I don't think we want to require for each write to a table. I think we would want to add a way to track a file and the snapshot for which it was produced, so that we can figure out what changed since and update it asynchronously. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
