[
https://issues.apache.org/jira/browse/BEAM-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Whittle resolved BEAM-8810.
-------------------------------
Fix Version/s: 2.19.0
Resolution: Fixed
> Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
> ---------------------------------------------------------------------------
>
> Key: BEAM-8810
> URL: https://issues.apache.org/jira/browse/BEAM-8810
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: Minor
> Fix For: 2.19.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> In several pipelines using streaming engine and thus the streaming commit
> rpcs, work became stuck in state COMMITTING indefinitely. Such stuckness
> coincided with repeated streaming rpc failures.
> The status page shows that the key has work in state COMMITTING, and has 1
> queued work item.
> There is a single active commit stream, with 0 pending requests.
> The stream could exist past the stream deadline because the StreamCache only
> closes stream due to the deadline when a stream is retrieved, which only
> occurs if there are other commits. Since the pipeline is stuck due to this
> event, there are no other commits.
> It seems therefore there is some race on the commitStream between onNewStream
> and commitWork that either prevents work from being retried, an exception
> that triggers between when the pending request is removed and the callback is
> called, or some potential corruption of the activeWork data structure.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)