[
https://issues.apache.org/jira/browse/BEAM-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072207#comment-17072207
]
Sam Whittle commented on BEAM-9651:
-----------------------------------
This is different from the other issue because we are not making the call from
a grpc thread. The executor for grpc client callbacks is unlimited so those
blocking doesn't explain why the rpc never becomes ready.
The rpc may be unready because the channel is either not accepting or has too
much data queued. It is unclear why that happened but we should timeout when
waiting (to stream deadline for example) instead of blocking forever.
> StreamingDataflowWorker stuck waiting for
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: Major
>
> Operation ongoing in step <redacted> for at least 28h10m00s without
> outputting or completing in state windmill-read at
> sun.misc.Unsafe.park(Native Method) at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
> at
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
> at
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
> at
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
> at
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.<init>(GrpcWindmillServer.java:941)
> at
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
> at
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
> Source) at
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:159)
> at
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:158)
> at
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
> at
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
> at
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
> at
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
> at
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
> at
> <redacted>
> Because the stream is started in a StreamPool synchronized block, all other
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or
> if there is a race with the use of the Phaser that causes the stuckness.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)