amogh-jahagirdar opened a new pull request, #8525: URL: https://github.com/apache/iceberg/pull/8525
Currently there is no validation on delete files API for verifying that the file to be deleted actually exists prior to the commit. This can cause unexpected behavior, for example: 1.) Rewrite Data files compacts FILE_A + some delete files 2.) Concurrently a DeleteFiles call was done for FILE_A. 3.) Deletion gets retried after 1 completes. At the point it retries, FILE_A no longer exists due to the compaction so the deletion is a no-op. From a user's perspective when they go and query the table after 3, they'll still see the data in FILE_A, even though the wouldn't expect it since they recieved a successful delete of FILE_A. This change currently adds a new validation API as opposed to changing the behavior to always do strict validation for purpose of backwards compatibility. I wanted to open this up and discuss with the community to see about if we want the new API or if we just want to change this behavior since it's mainly just used by maintenance procedures. Also I'd need to compare what serializable isolation vs snapshot isolation should guarantee in this concurrent delete scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
