jackye1995 commented on PR #6948: URL: https://github.com/apache/iceberg/pull/6948#issuecomment-1619156336
Discussed offline with Ryan, Daniel and Eduard. I think we should be good here. The requirement to have all table snapshots consistent at the starting time of transaction is not a hard requirement in the definition of snapshot or serializable isolation. For the non-repeatable or phantom read issue raised, the issue has to happen in a repeated read of the same table in the same transaction. So in the example I gave, table2 is only read once, so whatever state is at the catalog load time could be considered the valid status and not a non-repeatable or phantom read. There are 2 cases which could cause phantom read: 1. a self join query 2. multiple reads to the same table in a multi-statement transaction In the first case, although not strictly enforced, but most query engines cache the metadata when fetching table from catalog, so it is not possible to read new data in the second scan of the table. This is something that ideally engines should enforce, but definitely not something that catalog should enforce. In the second case, a multi-statement transaction must implement the proposed transaction API, so we should be good there. I will continue to think about any other edge cases, but at least we are not blocked by this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
