rdblue commented on pull request #1469:
URL: https://github.com/apache/iceberg/pull/1469#issuecomment-696912501


   @jacques-n, you may be interested in this discussion.
   
   For a DELETE using position delete files, I think that this isn't quite 
correct: "data files referenced by new deletes must be still present". The 
logic for "no validation for delete files" applies to this case: if a data file 
was deleted, then it's okay to delete the row twice. The validation should be 
"data files referenced by new deletes must still be present or must be deleted; 
i.e., cannot be rewritten or overwritten."
   
   For a DELETE using equality delete files, I'm not sure that snapshot 
isolation is distinct. If a data file is added concurrently that has a row that 
is now deleted, then either that commit is first and the row _is_ deleted or 
the commit is later and it is appended. Either way, the operations are 
independent. There is no need to validate "no new potentially matching data 
files since we read" because there is not necessarily a read, and the delete 
applies to the data automatically.
   
   UPDATE with position delete files looks correct to me.
   
   UPDATE with equality delete files also looks correct, but I think it helps 
to think of that as UPSERT and not as UPDATE. A row that is concurrently 
written will have values from the last UPSERT operation. This is almost 
certainly from an external data source because it makes little sense to read a 
row, update its values, and update it using an equality delete that will delete 
all copies, including those written since the row was read.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to