zhanglistar commented on code in PR #12297:
URL: https://github.com/apache/gluten/pull/12297#discussion_r3471433859


##########
gluten-flink/runtime/src/main/java/org/apache/gluten/table/runtime/operators/GlutenSourceFunction.java:
##########
@@ -109,6 +109,7 @@ public void run(SourceContext<OUT> sourceContext) throws 
Exception {
           processAvailableElement(sourceContext);
           break;
         case BLOCKED:
+          task.waitFor();

Review Comment:
   This line was added in the Pulsar source support commit. 
   
   Previously the BLOCKED case only logged "Get empty row" and continued the 
loop, which was fine for Nexmark since it generates data in batches and rarely 
returns BLOCKED. However, for real streaming sources like Pulsar/Kafka, when 
there's no upstream data available, `advance()` returns BLOCKED. Without 
`waitFor()`, the run loop would spin indefinitely calling `advance()` in a 
tight loop and pin the CPU.
   
   `waitFor()` blocks the thread until the native side has data available or 
the task is closed. This is the standard UpIterator usage pattern in velox4j.
   
   Regarding cancellation: Flink calls `cancel()` (sets `isRunning = false`) 
followed by `close()` (calls `task.close()`). The native task close will 
complete the blocking future inside `waitFor()`, which unblocks the thread. The 
run loop then checks `isRunning` and exits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to