amogh-jahagirdar opened a new pull request, #8525:
URL: https://github.com/apache/iceberg/pull/8525

   Currently there is no validation on delete files API for verifying that the 
file to be deleted actually exists prior to the commit. This can cause 
unexpected behavior, for example:
   
   1.) Rewrite Data files compacts FILE_A + some delete files
   2.) Concurrently a DeleteFiles call was done for FILE_A.
   3.) Deletion gets retried after 1 completes. At the point it retries, FILE_A 
no longer exists due to the compaction so the deletion is a no-op.
   
   From a user's perspective when they go and query the table after 3, they'll 
still see the data in FILE_A, even though the wouldn't expect it since they 
recieved a successful delete of FILE_A.
   
   This change currently adds a new validation API as opposed to changing the 
behavior to always do strict validation for purpose of backwards compatibility. 
I wanted to open this up and discuss with the community to see about if we want 
the new API or if we just want to change this behavior since it's mainly just 
used by maintenance procedures.
   
   Also I'd need to compare what serializable isolation vs snapshot isolation 
should guarantee in this concurrent delete scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to