shangxinli commented on code in PR #18405:
URL: https://github.com/apache/hudi/pull/18405#discussion_r3016471716
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##########
@@ -872,6 +873,10 @@ private Pair<Option<String>, JavaRDD<WriteStatus>>
writeToSinkAndDoMetaSync(Hood
totalSuccessfulRecords);
String commitActionType = CommitUtils.getCommitActionType(cfg.operation,
HoodieTableType.valueOf(cfg.tableType));
+ // Run pre-commit streaming offset validators (if configured) before
commit
+ SparkStreamerValidatorUtils.runValidators(props, instantTime,
writeStatusRDD,
+ checkpointCommitMetadata, metaClient);
Review Comment:
The placement before writeClient.commit() is intentional: offset validation
is a stronger guard than commitOnErrors. If the offset deviation indicates
potential data loss, we want to prevent the commit entirely, regardless of
whether commitOnErrors is true (which only tolerates individual record write
failures, not systematic data loss). I've added a comment in the code to make
this explicit.
##########
bootstrap_register_only_issue.md:
##########
Review Comment:
Removed — bootstrap_register_only_issue.md was accidentally staged from a
scratch session and has been deleted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]