HuangZhenQiu commented on code in PR #18468:
URL: https://github.com/apache/hudi/pull/18468#discussion_r3036160232
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/HoodieRecordEmitter.java:
##########
@@ -18,21 +18,54 @@
package org.apache.hudi.source.reader;
+import org.apache.flink.api.common.eventtime.Watermark;
import org.apache.flink.api.connector.source.SourceOutput;
import org.apache.flink.connector.base.source.reader.RecordEmitter;
+import org.apache.hudi.common.table.timeline.HoodieInstantTimeGenerator;
import org.apache.hudi.source.split.HoodieSourceSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
import java.io.Serializable;
+import java.text.ParseException;
/**
* Default Hoodie record emitter.
- * @param <T>
+ *
+ * <p>In addition to forwarding each record downstream, this emitter advances
the Flink
+ * watermark to the split's {@code latestCommit} epoch when the last record of
the split
+ * is processed. Downstream operators can rely on this watermark to trigger
time-based
+ * windows or gauge read progress in bounded Hudi pipelines.
+ *
+ * @param <T> record type
*/
public class HoodieRecordEmitter<T> implements
RecordEmitter<HoodieRecordWithPosition<T>, T, HoodieSourceSplit>, Serializable {
+ private static final Logger LOG =
LoggerFactory.getLogger(HoodieRecordEmitter.class);
Review Comment:
It is a nice question. DefaultHoodieSplitProvider is the default
SplitProvider in which split sorted by the last commit time. Thus, each of TM
will handle with split in order. All of the data's event time should be less
than the last commit time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]