aokolnychyi commented on pull request #1469:
URL: https://github.com/apache/iceberg/pull/1469#issuecomment-697020400


   > For a DELETE using position delete files, I think that this isn't quite 
correct: "data files referenced by new deletes must be still present". The 
logic for "no validation for delete files" applies to this case: if a data file 
was deleted, then it's okay to delete the row twice. The validation should be 
"data files referenced by new deletes must still be present or must be deleted; 
i.e., cannot be rewritten or overwritten."
   
   What about a copy-on-write update that happened concurrently? It could take 
a file that my delete file references and rewrite it into a new file and keep 
the record I want to remove. If we allow to commit the delete file that 
references now non-existing file, there is no way we can guarantee the record 
we were about to remove is actually removed.
   
   > For a DELETE using equality delete files, I'm not sure that snapshot 
isolation is distinct. If a data file is added concurrently that has a row that 
is now deleted, then either that commit is first and the row is deleted or the 
commit is later and it is appended. Either way, the operations are independent. 
There is no need to validate "no new potentially matching data files since we 
read" because there is not necessarily a read, and the delete applies to the 
data automatically.
   
   Agree.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to