aiborodin commented on PR #14092:
URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3332157631
> They will not be applied to DF2 or DF3 as they are added in the same
commit, and as a result we will have a duplication in our table. Both R1' and
R1'' will be present after C3
Thank you for the detailed clarification on this, @pvary and @mxm. My change
_does not_ aggregate WriteResults across checkpoints. Each checkpoint would
create a separate snapshot with its own delete and data files. The
`DynamicCommitter` code in this change evidences this:
```java
private void commitDeltaTxn(
Table table,
String branch,
NavigableMap<Long, Committer.CommitRequest<DynamicCommittable>>
pendingRequests,
CommitSummary summary,
String newFlinkJobId,
String operatorId) {
for (Map.Entry<Long, CommitRequest<DynamicCommittable>> e :
pendingRequests.entrySet()) {
// We don't commit the merged result into a single transaction because
for the sequential
// transaction txn1 and txn2, the equality-delete files of txn2 are
required to be applied
// to data files from txn1. Committing the merged one will lead to the
incorrect delete
// semantic.
WriteResult result = e.getValue().getCommittable().writeResult();
```
The current change would _only_ aggregate WriteResults across multiple
parallel writers per (table, branch, checkpoint) triplet, similar to the
current non-dynamic `IcebergSink`.
Given the above, @mxm, do you still think we need to commit multiple
WriteResults separately for (table, branch, checkpoint) triplet and implement
the index-based solution to guarantee idempotency as you mentioned here:
https://github.com/apache/iceberg/issues/14090#issuecomment-3324732610? If so,
could you please explain why this solution is necessary?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]