[ 
https://issues.apache.org/jira/browse/BEAM-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072114#comment-17072114
 ] 

Sam Whittle commented on BEAM-9651:
-----------------------------------

This looks similar to BEAM-4280 regarding the fnapi DirectObserver class which 
now times out if the phase is not reached to avoid deadlock.

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-9651
>                 URL: https://issues.apache.org/jira/browse/BEAM-9651
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: Major
>
> Operation ongoing in step <redacted> for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.<init>(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.<init>(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> <redacted>
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to