rdblue commented on code in PR #6948:
URL: https://github.com/apache/iceberg/pull/6948#discussion_r1251319541


##########
core/src/main/java/org/apache/iceberg/catalog/CatalogTransaction.java:
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.catalog;
+
+public interface CatalogTransaction {
+
+  enum IsolationLevel {
+
+    /**
+     * All reads that are being made will see the last committed values that 
existed when the table

Review Comment:
   @jackye1995, I think the behavior for the situation you're describing is an 
open question for the `snapshot` isolation level. First, though, I want to 
cover some background on how we think about `serializable` isolation to make 
sure we are thinking about "simultaneous" the same way.
   
   For `serializable`, the requirement is that there **_exists_** some ordering 
of transactions that produces the correct result. Say transaction 1 starts and 
reads from a table, then transaction 2 commits to that table, and finally 
transaction 1 attempts to commit. Iceberg will allow the commit as long as the 
result of the commit is not affected by the changes made by transaction 2. That 
is, Iceberg reorders the transactions if the reads would have produced the same 
result. This relies on checking conflicts in tables that have been updated. And 
the idea of when a transaction happens or starts is flexible in this model.
   
   Another complication is the idea of a point in time in a catalog. There's no 
guarantee that catalogs have an order of changes that applies across tables. 
The only requirement imposed by Iceberg is that there is a linear history for 
any single table and there isn't a general _happens-before_ relationship 
between commits across tables. There is, however, a limited relationship 
between tables involved in transactions.
   
   I think there's a good argument that we can't just use that as a reason not 
to do anything. After all, I can have situations where completely ignoring the 
point-in-time concern is clearly wrong:
   
   ```sql
   CREATE TABLE a (id bigint);
   CREATE TABLE b (id bigint);
   INSERT INTO a VALUES (1);
   
   -- Time T1
   
   BEGIN TRANSACTION;
     INSERT INTO b SELECT id FROM a;
     TRUNCATE TABLE a;
   COMMIT;
   
   -- Time T2
   ```
   
   If I were to read from `a` at T1 and read from `b` at T2, then I could see 
`row(id=1)` twice, even though there was never a "time" when it was in both 
tables because the transaction was atomic. I don't think I'd call this a "dirty 
read" because that means _uncommitted_ data was read, but it's still clearly a 
problem.
   
   I think there are 3 options:
   1. Coordinate table loading with the catalog
   2. Use validations that data read has not changed
   3. Define snapshot isolation differently, or change the name to something 
else
   
   Right now, the difference between 2 and 3 is basically adding checks so we 
don't have to decide right away. That's why it's like this today.
   
   ### 1. Coordinate table loading with the catalog
   
   To coordinate table loading, we'd add something like `startTransaction()` to 
get a transaction state. Then we'd pass that state back to the catalog in order 
to load at the start time. You can imagine the catalog passing back basically a 
transaction ID and loading tables using the state of that TID.
   
   I don't think this is a viable option. It's a lot of work to define the APIs 
and there are still catalogs that can't implement it. The result is that we'd 
end up with **inconsistent behavior across catalogs**, where some catalogs just 
load the latest copy of the table because that's all they can do.
   
   ### 2. Use validations that data read has not changed
   
   This option is similar to how `serializable` is handled. The transaction 
"start" time floats -- you can use data that was read as long as when the 
transaction commits, the outcome would be the same.
   
   The main drawback of this approach is that it isn't clear whether this is 
actually distinct from `serializable`, which basically does the same validation.
   
   ### 3. Define `snapshot` isolation differently or rename it
   
   This option may seem crazy, but there's a good argument for it. If there's 
little difference between `serializable` and `snapshot` isolation with option 
2, then `serializable` should be used for cases like the SQL above. However, we 
clearly want to be able to relax the constraints for certain cases:
   * Simultaneous `DELETE FROM` and `INSERT INTO` -- it makes little sense to 
fail a delete because a matching row was just inserted, when a slightly 
different order (delete commits first) would have been perfectly fine
   * External coordination -- running scheduled a `MERGE INTO` that selects 
from a source table that's constantly updated should generally succeed, even if 
the source table is modified. Most of the time, consumption will be incremental 
and coordinated externally. The important thing is knowing what version of the 
source table was used, not failing if it is updated during the job.
   
   These two cases align with the existing behavior of `snapshot` isolation in 
single-table commits -- but it makes sense there because "snapshot" applies to 
a single table. If the table is simultaneously modified, the original version 
is used and new data is ignored. However, deletions are validated.
   
   ### Last comment
   
   I think that captures the background for the decision that we need to make 
on this. Right now, I think the best approach is to collect use cases and think 
about how they should be handled, rather than making a decision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to