[
https://issues.apache.org/jira/browse/BEAM-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Whittle updated BEAM-12144:
-------------------------------
Fix Version/s: 2.32.0
(was: 2.31.0)
> Dataflow streaming worker stuck and unable to get work from Streaming Engine
> ----------------------------------------------------------------------------
>
> Key: BEAM-12144
> URL: https://issues.apache.org/jira/browse/BEAM-12144
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.26.0
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: P2
> Fix For: 2.32.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> Observed in 2.26 but seems like it could affect later versions as well, as
> previous issues addressing similar problems were before 2.26. This seems
> similar to BEAM-9651 but not the deadlock observed there.
> The thread getting work has the following stack:
> --- Threads (1): [Thread[DispatchThread,1,main]] State: WAITING stack: ---
> [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
>
> [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
> [email protected]/java.util.concurrent.Phaser$QNode.block(Phaser.java:1127)
>
> [email protected]/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128)
>
> [email protected]/java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1057)
>
> [email protected]/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:747)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:662)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetWorkStream.onNewStream(GrpcWindmillServer.java:868)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:677)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetWorkStream.(GrpcWindmillServer.java:860)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetWorkStream.(GrpcWindmillServer.java:843)
>
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getWorkStream(GrpcWindmillServer.java:543)
>
> app//org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.streamingDispatchLoop(StreamingDataflowWorker.java:1047)
>
> app//org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$1.run(StreamingDataflowWorker.java:670)
> [email protected]/java.lang.Thread.run(Thread.java:834)
> The status page shows:
> GetWorkStream: 0 buffers, 400 inflight messages allowed, 67108864 inflight
> bytes allowed, current stream is 61355396ms old, last send 61355396ms, last
> response -1ms
> Showing that the stream was created 17 hours ago, sent the header message but
> never received a response. With the stack trace it appears that the header
> was never sent but the stream also didn't terminate with a deadline exceed.
> This seems like a grpc issue to not get an error for the stream, however it
> would be safer to not block indefinitely on the Phaser waiting for the send
> and instead throw an exception after 2x the stream deadline for example.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)