[ 
https://issues.apache.org/jira/browse/BEAM-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Whittle resolved BEAM-8810.
-------------------------------
    Fix Version/s: 2.19.0
       Resolution: Fixed

> Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-8810
>                 URL: https://issues.apache.org/jira/browse/BEAM-8810
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: Minor
>             Fix For: 2.19.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> In several pipelines using streaming engine and thus the streaming commit 
> rpcs, work became stuck in state COMMITTING indefinitely.  Such stuckness 
> coincided with repeated streaming rpc failures.
> The status page shows that the key has work in state COMMITTING, and has 1 
> queued work item.
> There is a single active commit stream, with 0 pending requests.
> The stream could exist past the stream deadline because the StreamCache only 
> closes stream due to the deadline when a stream is retrieved, which only 
> occurs if there are other commits.  Since the pipeline is stuck due to this 
> event, there are no other commits.
> It seems therefore there is some race on the commitStream between onNewStream 
> and commitWork that either prevents work from being retried, an exception 
> that triggers between when the pending request is removed and the callback is 
> called, or some potential corruption of the activeWork data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to