shangxinli commented on code in PR #18405:
URL: https://github.com/apache/hudi/pull/18405#discussion_r3016471716


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##########
@@ -872,6 +873,10 @@ private Pair<Option<String>, JavaRDD<WriteStatus>> 
writeToSinkAndDoMetaSync(Hood
           totalSuccessfulRecords);
       String commitActionType = CommitUtils.getCommitActionType(cfg.operation, 
HoodieTableType.valueOf(cfg.tableType));
 
+      // Run pre-commit streaming offset validators (if configured) before 
commit
+      SparkStreamerValidatorUtils.runValidators(props, instantTime, 
writeStatusRDD,
+          checkpointCommitMetadata, metaClient);

Review Comment:
   The placement before writeClient.commit() is intentional: offset validation 
is a stronger guard than commitOnErrors. If the offset deviation indicates 
potential data loss, we want to prevent the commit entirely, regardless of 
whether commitOnErrors is true (which only tolerates individual record write 
failures, not systematic data loss). I've added a comment in the code to make 
this explicit.



##########
bootstrap_register_only_issue.md:
##########


Review Comment:
   Removed — bootstrap_register_only_issue.md was accidentally staged from a 
scratch session and has been deleted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to