[ 
https://issues.apache.org/jira/browse/BEAM-12144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Whittle updated BEAM-12144:
-------------------------------
    Fix Version/s: 2.32.0
                       (was: 2.31.0)

> Dataflow streaming worker stuck and unable to get work from Streaming Engine
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-12144
>                 URL: https://issues.apache.org/jira/browse/BEAM-12144
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.26.0
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>             Fix For: 2.32.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Observed in 2.26 but seems like it could affect later versions as well, as 
> previous issues addressing similar problems were before 2.26.  This seems 
> similar to BEAM-9651 but not the deadlock observed there.
> The thread getting work has the following stack:
> --- Threads (1): [Thread[DispatchThread,1,main]] State: WAITING stack: ---
>   [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
>   
> [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
>   [email protected]/java.util.concurrent.Phaser$QNode.block(Phaser.java:1127)
>   
> [email protected]/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3128)
>   
> [email protected]/java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1057)
>   
> [email protected]/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:747)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:662)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetWorkStream.onNewStream(GrpcWindmillServer.java:868)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:677)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetWorkStream.(GrpcWindmillServer.java:860)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetWorkStream.(GrpcWindmillServer.java:843)
>   
> app//org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getWorkStream(GrpcWindmillServer.java:543)
>   
> app//org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.streamingDispatchLoop(StreamingDataflowWorker.java:1047)
>   
> app//org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$1.run(StreamingDataflowWorker.java:670)
>   [email protected]/java.lang.Thread.run(Thread.java:834)
> The status page shows:
> GetWorkStream: 0 buffers, 400 inflight messages allowed, 67108864 inflight 
> bytes allowed, current stream is 61355396ms old, last send 61355396ms, last 
> response -1ms
> Showing that the stream was created 17 hours ago, sent the header message but 
> never received a response.  With the stack trace it appears that the header 
> was never sent but the stream also didn't terminate with a deadline exceed.  
> This seems like a grpc issue to not get an error for the stream, however it 
> would be safer to not block indefinitely on the Phaser waiting for the send 
> and instead throw an exception after 2x the stream deadline for example.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to