RussellSpitzer commented on pull request #2782: URL: https://github.com/apache/iceberg/pull/2782#issuecomment-874948842
@ayushchauhan0811 Rewrite and Merge operations will have data that was previously already in the data set. Consider a compaction operation which changes no actual rows but combines files, all old files are no longer valid and are deleted and a set of new files are added. So if you check which files were "added" by this compaction operation you would see the entire table as having been "added" in this snapshot. Merge operations (copy-on-write) have a similar issue. Imagine a merge operation updates a single row in a data file. The old file will be deleted and a new file will be created. The new file will have all the data in the old file and one additional row. If you scan this new file you will get all the data which was appended in a previous action as well as the new data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
