Robin Palotai created BEAM-6724:
-----------------------------------
Summary: Go SDK on Dataflow processing step emits but it doesn't
reach framework
Key: BEAM-6724
URL: https://issues.apache.org/jira/browse/BEAM-6724
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Robin Palotai
When sending a job with a larger (not so large, 30MB) input to Dataflow runner,
I can see the worker logs that it emits everything in a given step, but then
the framework (not sure which one, harness or above) doesn't seem to register
that it reached the finish state (or maybe it doesn't reach the finish state).
For a smaller input (~1MB) the whole pipeline runs fine.
Do you have any pointers how to debug (say, add logging) the cause of the
stuckness? Maybe there are some buffers not flushed? A state not transitioned?
Generally, which part of the beam go codebase is responsible for these
transitions?
Thank you!
Version: current Go SDK from HEAD + https://github.com/apache/beam/pull/7889
patches to make plan check
Cloud dataflow console says "Apache Beam SDK for Go 0.5.0".
Logs:
16:35:39.788 CET Starting MapTask stage s02
16:35:44.508 CET <first worker progress message>
16:36:34.588 CET <last progress message + done>
16:36:34.963 CET "DataSource: 2 elements in 53507659897 ns"
16:40:45.227 CET <new worker is started in place of one perceived as failed>
Initializing Go harness: /opt/apache/beam/boot --id=1
--logging_endpoint=localhost:12370 --control_endpoint=localhost:12371
--artifact_endpoint=localhost:12372 --provision_endpoint=localhost:12373
--semi_persist_dir=/var/opt/google undefined
16:42:29.027 CET Processing stuck in step s02 for at least 05m00s without
outputting or completing in state finish
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)