rdblue commented on pull request #1849:
URL: https://github.com/apache/iceberg/pull/1849#issuecomment-736916300


   My main concern with this API is that it doesn't really help with some of 
the major challenges of resolving conflicts. It may be that we have been 
talking about different parts of the problem.
   
   To me, there are two main approaches:
   
   * **Cherry-pick commits**: A transaction would keep track of different table 
metadata files without updating each table's metadata location. When it is time 
to commit, swap the locations in one atomic operation. If there is a conflict, 
then refresh that table, cherry-pick the commits since the base metadata file, 
and retry with the new metadata location.
   * **Retry pending updates**: A transaction would keep track of all 
`PendingUpdate` instances that were created to modify the tables. These can 
already refresh and retry the individual operation's changes. The initial 
writes would configure and commit each change to a transaction. The transaction 
would perform the atomic swap of multiple locations. If there is a conflict, 
then refresh each table and re-commit the changes to it (just like single-table 
transactions). Then retry the atomic swap.
   
   I was expecting an API that used the second approach. That's what I see as 
the major problem to solve: how to coordinate operations across multiple tables 
in a way that allows the existing `PendingUpdate` retries, like an extension of 
the single-table transaction API.
   
   I think that keeping each configured `PendingUpdate` around is going to be 
an easier way to retry because the logic for retries is already written. Trying 
to cherry-pick changes means needing to know much more information than is 
tracked currently, and would require reproducing all of the changes from 
metadata and applying them.
   
   I think a normal operation example is helpful:
   
   ```sql
   DELETE FROM db.table WHERE id = 5;
   ```
   
   Let's say that only replaces one file: delete `file_a.parquet`, add 
`file_b.parquet`. But it will also validate that no data has been added that 
could match the `WHERE` clause since the operation started, because it needs to 
scan the table to find where the records are located.
   
   For the cherry-pick approach, we can recover the deleted file, the added 
file, and the parent snapshot ID. But we can't recover the filter from the 
`WHERE` clause, nor do we know whether the validation was enabled (due to the 
isolation level chosen for the change). To make the cherry-pick approach work, 
we would need to write that additional information into table metadata and 
build operations that can use it. We did this for dynamic partition overwrite 
because it was just a flag, but I'm concerned that it would be a lot of work to 
do this for all of the operations.
   
   For the `PendingUpdate` approach, each operation is fully configured and 
kept around, so we can simply call `commit` again to re-apply the changes in 
the transaction.
   
   On the other hand, maybe the way I'm thinking about this is wrong because 
I'm assuming that we would want to apply the same validations (i.e., no files 
that might match `id = 5` have been added). Maybe for multi-table transactions 
we only want to validate that since the starting snapshot, no data changes have 
been made to a table for any table that is read. That would ensure correctness, 
and then we could simply check that the file changes can still be applied. This 
would make sense because we need to add the extra validations about other table 
states. As long as those are coarse validations based on only snapshot ID, then 
maybe it is okay that the commit changes are as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to