[GitHub] [iceberg] rdblue commented on pull request #2182: Support for PartitionStatsFile in each snapshot

GitBox Tue, 02 Feb 2021 14:27:55 -0800


rdblue commented on pull request #2182:
URL: https://github.com/apache/iceberg/pull/2182#issuecomment-772052442



   I also think that we need to have a concrete plan for how these files are 
created and updated.
   
   The number of partitions in a table could easily be a few million. That's 
quite a bit to write each time a snapshot is produced, since these files are 
immutable. And because these would be maintained by applying the changes from 
one to the next (rather than scanning all of the metadata) the cost would be to 
read an entire file and write an entire file. That's quite a bit of extra work 
that I don't think we want to require for each write to a table.
   
   I think we would want to add a way to track a file and the snapshot for 
which it was produced, so that we can figure out what changed since and update 
it asynchronously.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #2182: Support for PartitionStatsFile in each snapshot

Reply via email to