jackye1995 commented on PR #6948:
URL: https://github.com/apache/iceberg/pull/6948#issuecomment-1619156336

   Discussed offline with Ryan, Daniel and Eduard. I think we should be good 
here. 
   
   The requirement to have all table snapshots consistent at the starting time 
of transaction is not a hard requirement in the definition of snapshot or 
serializable isolation.
   
   For the non-repeatable or phantom read issue raised, the issue has to happen 
in a repeated read of the same table in the same transaction. So in the example 
I gave, table2 is only read once, so whatever state is at the catalog load time 
could be considered the valid status and not a non-repeatable or phantom read.
   
   There are 2 cases which could cause phantom read:
   
   1. a self join query
   2. multiple reads to the same table in a multi-statement transaction
   
   In the first case, although not strictly enforced, but most query engines 
cache the metadata when fetching table from catalog, so it is not possible to 
read new data in the second scan of the table. This is something that ideally 
engines should enforce, but definitely not something that catalog should 
enforce.
   
   In the second case, a multi-statement transaction must implement the 
proposed transaction API, so we should be good there.
   
   I will continue to think about any other edge cases, but at least we are not 
blocked by this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to