Reo-LEI edited a comment on pull request #3258: URL: https://github.com/apache/iceberg/pull/3258#issuecomment-939651147
> Because position deletes will only reference data files being added to the table, there is no possibility that those files are concurrently deleted. If user run `deleteOrphanFiles` action or delete referenced data file by manually/automatically program which is implement by user before flink commit, I think this validation can prevent to commit this not exists files. > should we look into calling validateFromSnapshot with the last committed snapshot ID? I have another PR(#3103) to check snapshot history from last committed snapshot id, but this PR is trying to speed up the `IcebergFilesCommitter`, but not to resolve #2482. Becasue as I mentioned on https://github.com/apache/iceberg/pull/3103/files#r718347222, we could not guarantee the `lastCommittedSnapshot` what we stored in snapshot summary or somewhere else will alway exists when we restore the flink job. @rdblue I think the proper way to resolve #2482 is `MergingSnapshotProducer.validationHistory` shop to travel the not exists snapshots which are delete by `expireSnapshots` action. https://github.com/apache/iceberg/blob/6ca31fc2933e7533cc0f9bad9fcbc81c16c13a57/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
