aiborodin commented on PR #14092:
URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3332157631

   > They will not be applied to DF2 or DF3 as they are added in the same 
commit, and as a result we will have a duplication in our table. Both R1' and 
R1'' will be present after C3
   
   Thank you for the detailed clarification on this, @pvary and @mxm. My change 
_does not_ aggregate WriteResults across checkpoints. Each checkpoint would 
create a separate snapshot with its own delete and data files. The 
`DynamicCommitter` code in this change evidences this:
   ```java
     private void commitDeltaTxn(
         Table table,
         String branch,
         NavigableMap<Long, Committer.CommitRequest<DynamicCommittable>> 
pendingRequests,
         CommitSummary summary,
         String newFlinkJobId,
         String operatorId) {
       for (Map.Entry<Long, CommitRequest<DynamicCommittable>> e : 
pendingRequests.entrySet()) {
         // We don't commit the merged result into a single transaction because 
for the sequential
         // transaction txn1 and txn2, the equality-delete files of txn2 are 
required to be applied
         // to data files from txn1. Committing the merged one will lead to the 
incorrect delete
         // semantic.
         WriteResult result = e.getValue().getCommittable().writeResult();
   ```
   The current change would _only_ aggregate WriteResults across multiple 
parallel writers per (table, branch, checkpoint) triplet,  similar to the 
current non-dynamic `IcebergSink`.
   
   Given the above, @mxm, do you still think we need to commit multiple 
WriteResults separately for (table, branch, checkpoint) triplet and implement 
the index-based solution to guarantee idempotency as you mentioned here: 
https://github.com/apache/iceberg/issues/14090#issuecomment-3324732610? If so, 
could you please explain why this solution is necessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to