aokolnychyi commented on pull request #1469:
URL: https://github.com/apache/iceberg/pull/1469#issuecomment-695686623


   I think this PR raises a very good point that we haven't considered for 
merge-on-read but already have for copy-on-write.
   
   This PR looks good to me but I want us to think through which validation we 
will eventually need. Let's consider the following use cases: DELETE and UPDATE 
with positional deletes, DELETE and UPDATE with equality deletes. Each 
operation may have different isolation levels: serializable and snapshot 
isolation (can be more but let's skip that for now).
   
   **DELETE with positional deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no new potentially matching data files since we 
read<br/> - data files referenced by new deletes must be still present<br/> - 
no validation on delete files as it is ok if the row was deleted concurrently
   | snapshot  |  - data files referenced by new deletes must be still 
present<br/> - no validation on new potentially matching data files since we 
read<br/>- no validation on delete files as it is ok if the row was deleted 
concurrently  |
   
   
   **UPDATE with positional deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no new potentially matching data files since we 
read<br/> - no new potentially matching delete files as it is NOT ok if the row 
was deleted concurrently <br/> - data files referenced by new deletes must be 
still present 
   | snapshot  |  - no new potentially matching delete files as it is NOT ok if 
the row was deleted concurrently <br/> - data files referenced by new deletes 
must be still present<br/> - no validation on new potentially matching data 
files since we read<br/>|
   
   **DELETE with equality deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no new potentially matching data files since we 
read<br/> - no validation on delete files as it is ok if the row was deleted 
concurrently
   | snapshot  | - no validation on new potentially matching data files since 
we read<br/>- no validation on delete files as it is ok if the row was deleted 
concurrently  |
   
   
   **UPDATE with equality deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no validation on new potentially matching data files 
since we don't have to read the table <br/> - no validation on new potentially 
matching delete files as we don't have to read the table
   | snapshot  |  - no validation on new potentially matching data files since 
we don't have to read the table <br/> - no validation on new potentially 
matching delete files as we don't have to read the table |
   
   Does this seem correct?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to