Re: [PR] fix(flink): enable batch read it for flink source v2 [hudi]

via GitHub Sun, 15 Mar 2026 21:08:24 -0700


HuangZhenQiu commented on code in PR #18325:
URL: https://github.com/apache/hudi/pull/18325#discussion_r2938036948



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/BatchRecords.java:
##########
@@ -117,8 +115,11 @@ public void seek(long startingRecordOffset) {
 
   public static <T> BatchRecords<T> forRecords(
       String splitId, ClosableIterator<T> recordIterator, int fileOffset, long 
startingRecordOffset) {
-
-    return new BatchRecords<>(
-        splitId, recordIterator, fileOffset, startingRecordOffset, new 
HashSet<>());
+    // Pre-populate finishedSplits with splitId so that FetchTask calls 
splitFinishedCallback
+    // immediately after enqueueing the batch. This removes the split from
+    // SplitFetcher.assignedSplits, causing the fetcher to idle and invoke
+    // elementsQueue.notifyAvailable(), which is required to drive the 
END_OF_INPUT signal

Review Comment:
   I agree. The better way is to have a fake batch reader, and return the an 
empty batch records with finished split id inside.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(flink): enable batch read it for flink source v2 [hudi]

Reply via email to