Sam Whittle created BEAM-8810:
---------------------------------
Summary: Dataflow runner - Work stuck in state COMMITTING with
streaming commit rpcs
Key: BEAM-8810
URL: https://issues.apache.org/jira/browse/BEAM-8810
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Sam Whittle
Assignee: Sam Whittle
In several pipelines using streaming engine and thus the streaming commit rpcs,
work became stuck in state COMMITTING indefinitely. Such stuckness coincided
with repeated streaming rpc failures.
The status page shows that the key has work in state COMMITTING, and has 1
queued work item.
There is a single active commit stream, with 0 pending requests.
The stream could exist past the stream deadline because the StreamCache only
closes stream due to the deadline when a stream is retrieved, which only occurs
if there are other commits. Since the pipeline is stuck due to this event,
there are no other commits.
It seems therefore there is some race on the commitStream between onNewStream
and commitWork that either prevents work from being retried, an exception that
triggers between when the pending request is removed and the callback is
called, or some potential corruption of the activeWork data structure.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)