Reo-LEI commented on pull request #3258:
URL: https://github.com/apache/iceberg/pull/3258#issuecomment-939651147


   > Because position deletes will only reference data files being added to the 
table, there is no possibility that those files are concurrently deleted.
   
   If user run `deleteOrphanFiles` action or delete referenced data file by 
manually/automatically program which is implement by user before flink commit, 
I think this validation can prevent to commit this not exists files.
   
   > should we look into calling validateFromSnapshot with the last committed 
snapshot ID?
   
   I have another PR(#3103) to check snapshot history from last committed 
snapshot id, but this PR is trying to speed up the `IcebergFilesCommitter`, but 
not to resolve #2482. Becasue as I mentioned on 
https://github.com/apache/iceberg/pull/3103/files#r718347222, we could not 
guarantee the `lastCommittedSnapshot` what we stored in snapshot summary or 
somewhere else will alway exists when we restore the flink job.
   
   @rdblue I think the proper way to resolve #2482 is 
`MergingSnapshotProducer.validationHistory` shop to travel the not exists 
snapshots which are delete by `expireSnapshots` action.
   
https://github.com/apache/iceberg/blob/6ca31fc2933e7533cc0f9bad9fcbc81c16c13a57/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to