HuangZhenQiu commented on code in PR #18468:
URL: https://github.com/apache/hudi/pull/18468#discussion_r3036160232


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieRecordEmitter.java:
##########
@@ -18,21 +18,54 @@
 
 package org.apache.hudi.source.reader;
 
+import org.apache.flink.api.common.eventtime.Watermark;
 import org.apache.flink.api.connector.source.SourceOutput;
 import org.apache.flink.connector.base.source.reader.RecordEmitter;
+import org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator;
 import org.apache.hudi.source.split.HoodieSourceSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import java.io.Serializable;
+import java.text.ParseException;
 
 /**
  * Default Hoodie record emitter.
- * @param <T>
+ *
+ * <p>In addition to forwarding each record downstream, this emitter advances 
the Flink
+ * watermark to the split's {@code latestCommit} epoch when the last record of 
the split
+ * is processed.  Downstream operators can rely on this watermark to trigger 
time-based
+ * windows or gauge read progress in bounded Hudi pipelines.
+ *
+ * @param <T> record type
  */
 public class HoodieRecordEmitter<T> implements 
RecordEmitter<HoodieRecordWithPosition<T>, T, HoodieSourceSplit>, Serializable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(HoodieRecordEmitter.class);
 

Review Comment:
   It is a nice question. DefaultHoodieSplitProvider is the default 
SplitProvider in which split sorted by the last commit time. Thus, each of TM 
will handle with split in order. All of the data's event time should be less 
than the last commit time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to