zhanglistar commented on code in PR #12297:
URL: https://github.com/apache/gluten/pull/12297#discussion_r3471433859
##########
gluten-flink/runtime/src/main/java/org/apache/gluten/table/runtime/operators/GlutenSourceFunction.java:
##########
@@ -109,6 +109,7 @@ public void run(SourceContext<OUT> sourceContext) throws
Exception {
processAvailableElement(sourceContext);
break;
case BLOCKED:
+ task.waitFor();
Review Comment:
This line was added in the Pulsar source support commit.
Previously the BLOCKED case only logged "Get empty row" and continued the
loop, which was fine for Nexmark since it generates data in batches and rarely
returns BLOCKED. However, for real streaming sources like Pulsar/Kafka, when
there's no upstream data available, `advance()` returns BLOCKED. Without
`waitFor()`, the run loop would spin indefinitely calling `advance()` in a
tight loop and pin the CPU.
`waitFor()` blocks the thread until the native side has data available or
the task is closed. This is the standard UpIterator usage pattern in velox4j.
Regarding cancellation: Flink calls `cancel()` (sets `isRunning = false`)
followed by `close()` (calls `task.close()`). The native task close will
complete the blocking future inside `waitFor()`, which unblocks the thread. The
run loop then checks `isRunning` and exits.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]