hudi-agent commented on code in PR #19023:
URL: https://github.com/apache/hudi/pull/19023#discussion_r3434936243


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteOperatorCoordinator.java:
##########
@@ -652,6 +652,8 @@ private void doCommit(long checkpointId, String instant, 
List<WriteStatus> dataW
     FlinkValidatorUtils.runValidators(conf, instant, allWriteStatus,
         checkpointCommitMetadata, () -> 
StreamerUtil.getPreviousCommitMetadata(this.metaClient));
 
+    // refresh the last txn metadata for OCC
+    this.writeClient.preTxn(tableState.operationType, this.metaClient, 
instant);

Review Comment:
   🤖 Following up on @danny0405's earlier point that `preTxn` should be invoked 
before each instant starts: moving it from `startInstant` into `doCommit` 
actually goes the other way and looks like it could regress normal multi-writer 
OCC. With the baseline captured microseconds before commit, 
`SimpleConcurrentFileWritesConflictResolutionStrategy.getCandidateInstantsV8AndAbove`
 calls `findInstantsAfter(lastSuccessful.requestedTime())` (strict `>`), which 
excludes the most-recent-completed instant itself — so an external writer that 
started and completed during our write window becomes the new baseline and is 
silently dropped from the candidate set (and it's also not in 
`pendingInflightAndRequestedInstants`, so 
`getCompletedInstantsDuringCurrentWriteOperation` won't catch it either). 
Spark's `BaseHoodieWriteClient.preWrite` sets `lastCompletedTxnAndMetadata` at 
write start for exactly this reason. Could the previous shape — `preTxn` at 
`startInstant` plus the explicit between-it
 eration refresh in `commitInstants` — be restored to preserve the detection 
window?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to