RussellSpitzer commented on pull request #2782:
URL: https://github.com/apache/iceberg/pull/2782#issuecomment-874948842


   @ayushchauhan0811 
   Rewrite and Merge operations will have data that was previously already in 
the data set. Consider a compaction operation which changes no actual rows but 
combines files, all old files are no longer valid and are deleted and a set of 
new files are added. So if you check which files were "added" by this 
compaction operation you would see the entire table as having been "added" in 
this snapshot.
   
   Merge operations (copy-on-write) have a similar issue. Imagine a merge 
operation updates a single row in a data file. The old file will be deleted  
and a new file will be created. The new file will have all the data in the old 
file and one additional row. If you scan this new file you will get all the 
data which was appended in a previous action as well as the new data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to