[GitHub] [iceberg] RussellSpitzer commented on pull request #2925: Core: Support serializable isolation for ReplacePartitions

GitBox Thu, 24 Feb 2022 15:12:07 -0800


RussellSpitzer commented on pull request #2925:
URL: https://github.com/apache/iceberg/pull/2925#issuecomment-1050351570



   > @RussellSpitzer @flyrain this is the issue we talked about, if any 
thoughts on what is 'snapshot' isolation once we define 'serializable' 
isolation.
   
   https://jepsen.io/consistency // Jepsen's Doc provides a good overview
   https://jepsen.io/consistency/models/serializable 
   
   
   Repeatable Reads is a bit different than snapshot Isolation
   https://jepsen.io/consistency/models/repeatable-read
   ```Repeatable read is closely related to 
[serializability](https://jepsen.io/consistency/models/serializable), but 
unlike serializable, it allows 
[phantoms](http://pmg.csail.mit.edu/papers/adya-phd.pdf): if a transaction T1 
reads a predicate, like "the set of all people with the name “Dikembe”, then 
another transaction T2 may create or modify a person with the name “Dikembe” 
before T1 commits. Individual objects are stable once read, but the predicate 
itself may not be.```
    
   https://jepsen.io/consistency/models/snapshot-isolation
   ```In a snapshot isolated system, each transaction appears to operate on an 
independent, consistent snapshot of the database. Its changes are visible only 
to that transaction until commit time, when all changes become visible 
atomically. If transaction T1 has modified an object x, and another transaction 
T2 committed a write to x after T1’s snapshot began, and before T1’s commit, 
then T1 must abort.```
   
   I'm pretty sure Postgres's definition there actually fits Snapshot Isolation 
better than Repeatable Read since [they don't allow Phantom 
Reads](https://www.postgresql.org/docs/13/transaction-iso.html#MVCC-ISOLEVEL-TABLE)
 which are allowed at a true Repeatable Reads Isolation . Snapshot Isolation in 
my mind says, you cannot modify records that were changed by a previous 
operation but you may modify records that were not modified by a previous 
commit while ignore the changes that previous commit produced.
   
   Stolen from [Sql-Server 
Blog](https://techcommunity.microsoft.com/t5/sql-server-blog/serializable-vs-snapshot-isolation-level/ba-p/383281)
   
![image](https://user-images.githubusercontent.com/413025/155622462-2a862637-a089-41fd-ac73-2032de7875fd.png)
   
   Here you imagine two commits as 
   "UPDATE color=white WHERE color = BLACK" 
   "UPDATE color= black where color = WHITE"
   
   Both of these commits are allowed to occur as if they applied to the same 
original commit because each operation only effected an isolated set of marbles.
   
   So in this case I believe INSERT OVERWRITE would never conflict with another 
INSERT, but would conflict with an update that changed any row within the 
partition being over-written. Another INSERT OVERWRITE would be a form of 
update so I believe @szehon-ho has the right of it here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on pull request #2925: Core: Support serializable isolation for ReplacePartitions

Reply via email to