[
https://issues.apache.org/jira/browse/BEAM-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744271#comment-16744271
]
Kenneth Knowles edited comment on BEAM-6451 at 1/16/19 5:25 PM:
----------------------------------------------------------------
I'm just going to brain-dump and not be too prescriptive about this exact case
(but very prescriptive in general :-). The recommended way to use futures is to
_never_ call get(). The existence of this is just to provide the very final
bridge to synchronous code. Instead, asynchronous chaining should be the
primary way to consume the results of futures.
The Java futures API missed the mark - all the APIs you _should_ use are on
CompletionStage and then the APIs that you sometimes _must_ use are on Future
(you can get a CompletableFuture for any CompletionStage IIRC). So MoreFutures
is an attempt to present the appropriate API surface. A timeout is probably not
always appropriate, but an asynchronous chain should be guaranteed to
terminate. The timeout should be at the point where a potentially failing
network call is made.
was (Author: kenn):
I'm just going to brain-dump and not be too prescriptive. The recommended way
to use futures is to _never_ call get(). The existence of this is just to
provide the very final bridge to synchronous code. Instead, asynchronous
chaining should be the primary way to consume the results of futures.
The Java futures API missed the mark - all the APIs you _should_ use are on
CompletionStage and then the APIs that you sometimes _must_ use are on Future
(you can get a CompletableFuture for any CompletionStage IIRC). So MoreFutures
is an attempt to present the appropriate API surface. A timeout is probably not
always appropriate, but an asynchronous chain should be guaranteed to
terminate. The timeout should be at the point where a potentially failing
network call is made.
> Portability Pipeline eventually hangs on bundle registration
> ------------------------------------------------------------
>
> Key: BEAM-6451
> URL: https://issues.apache.org/jira/browse/BEAM-6451
> Project: Beam
> Issue Type: Bug
> Components: java-fn-execution, runner-dataflow, sdk-py-harness
> Reporter: Scott Wegner
> Priority: Minor
> Labels: portability
>
> We've seen jobs using portability start off in a healthy state, but then
> eventually get stuck and hang on bundle registration. We see error logs from
> the worker harness:
> {code}
> Processing stuck in step s01 for at least 06h30m00s without outputting or
> completing in state finish at
> sun.misc.Unsafe.park(Native Method) at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
> at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at
> org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57) at
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:277)
> at
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:119)
> at
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1226)
> at
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:141)
> at
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:965)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at
> java.lang.Thread.run(Thread.java:745)
> {code}
> Looking at [the
> code|https://github.com/apache/beam/blob/release-2.8.0/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/RegisterAndProcessBundleOperation.java#L277],
> it looks like there are no timeouts on the Bundle Registration calls over
> the FnApi, which contributes to this hanging forever rather than giving a
> better failure.
> This bug report came from a customer running a python streaming pipeline
> using the new portability framework on Dataflow. Hopefully we can repro on
> our own in order to link to the job / logs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)