Robin Palotai created BEAM-6724:
-----------------------------------

             Summary: Go SDK on Dataflow processing step emits but it doesn't 
reach framework
                 Key: BEAM-6724
                 URL: https://issues.apache.org/jira/browse/BEAM-6724
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Robin Palotai


When sending a job with a larger (not so large, 30MB) input to Dataflow runner, 
I can see the worker logs that it emits everything in a given step, but then 
the framework (not sure which one, harness or above) doesn't seem to register 
that it reached the finish state (or maybe it doesn't reach the finish state).

For a smaller input (~1MB) the whole pipeline runs fine.

Do you have any pointers how to debug (say, add logging) the cause of the 
stuckness? Maybe there are some buffers not flushed? A state not transitioned? 
Generally, which part of the beam go codebase is responsible for these 
transitions?

Thank you!

Version: current Go SDK from HEAD + https://github.com/apache/beam/pull/7889 
patches to make plan check

Cloud dataflow console says "Apache Beam SDK for Go 0.5.0".

Logs:

16:35:39.788 CET Starting MapTask stage s02

16:35:44.508 CET <first worker progress message>

16:36:34.588 CET <last progress message + done>

16:36:34.963 CET "DataSource: 2 elements in 53507659897 ns"

16:40:45.227 CET <new worker is started in place of one perceived as failed> 
Initializing Go harness: /opt/apache/beam/boot --id=1 
--logging_endpoint=localhost:12370 --control_endpoint=localhost:12371 
--artifact_endpoint=localhost:12372 --provision_endpoint=localhost:12373 
--semi_persist_dir=/var/opt/google undefined

16:42:29.027 CET Processing stuck in step s02 for at least 05m00s without 
outputting or completing in state finish



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to