hudi-agent commented on code in PR #19023:
URL: https://github.com/apache/hudi/pull/19023#discussion_r3432549057
##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java:
##########
@@ -397,6 +397,20 @@ public void preTxn(WriteOperationType operationType,
HoodieTableMetaClient metaC
tableServiceClient.startAsyncArchiveService(this);
}
+ /**
+ * Initializes the transaction state for recommitting an inflight instant
during recovery.
+ * Sets lastCompletedTxnAndMetadata to the last completed instant whose
requested time
+ * and completion time are both before the given instant's requested time,
ensuring
+ * OCC conflict resolution only checks against genuinely concurrent commits.
+ */
+ public void preTxnForRecommit(WriteOperationType operationType,
HoodieTableMetaClient metaClient, String currentInstantTime) {
+ if (txnManager.isLockRequired() &&
config.needResolveWriteConflict(operationType, metaClient.isMetadataTable(),
config, metaClient.getTableConfig())) {
+ this.lastCompletedTxnAndMetadata =
TransactionUtils.getLastCompletedTxnInstantAndMetadata(metaClient,
currentInstantTime);
Review Comment:
🤖 Could you trace through the multi-pending restart case (e.g.
@HuangZhenQiu's 3-checkpoint-failure scenario where A and B are both inflight
on restart)? In iter 2 for B, A was just recommitted in iter 1, but its
completionTime is reassigned to wall-clock-now (post-restart, so >
B.requestedTime). The `completionTime < currentInstantTime` filter then
excludes A from the baseline, falling back to some earlier C.
SimpleConcurrentFileWritesConflictResolutionStrategy's
`findInstantsAfter(C.requestedTime)` then picks up A as a candidate, and
`hasConflict(B, A)` returns true on the shared file groups typical of streaming
upserts — re-triggering the same `HoodieWriteConflictException` this PR is
fixing. The new `testCommittingMultipleInstantsWithOCC` exercises the regular
`commitInstants` path (which uses the no-arg `preTxn`), so it wouldn't catch
this. @danny0405 thoughts on whether `preTxnForRecommit` should use the no-arg
`getLastCompletedTxnInstantAndMetadata` here so the just-re
committed instant becomes the baseline?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]