[ 
https://issues.apache.org/jira/browse/BEAM-8810?focusedWorklogId=359699&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359699
 ]

ASF GitHub Bot logged work on BEAM-8810:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Dec/19 23:22
            Start Date: 13/Dec/19 23:22
    Worklog Time Spent: 10m 
      Work Description: dpmills commented on pull request #10311: [BEAM-8810] 
Detect stuck commits in StreamingDataflowWorker
URL: https://github.com/apache/beam/pull/10311#discussion_r357868037
 
 

 ##########
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##########
 @@ -2110,20 +2152,22 @@ public MapTask getMapTask() {
     /** Mark the given key and work as active. */
     public boolean activateWork(ByteString key, Work work) {
       synchronized (activeWork) {
-        Queue<Work> queue = activeWork.get(key);
-        if (queue == null) {
-          queue = new ArrayDeque<>();
-          activeWork.put(key, queue);
-          queue.add(work);
-          // Fall through to execute without the lock held.
-        } else {
-          if (queue.peek().getWorkItem().getWorkToken() != 
work.getWorkItem().getWorkToken()) {
+        Deque<Work> queue = activeWork.get(key);
+        if (queue != null) {
+          Preconditions.checkState(!queue.isEmpty());
+          if (queue.peekLast().getWorkItem().getWorkToken() == 
work.getWorkItem().getWorkToken()) {
 
 Review comment:
   Check against everything in queue
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 359699)
    Time Spent: 1h 20m  (was: 1h 10m)

> Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-8810
>                 URL: https://issues.apache.org/jira/browse/BEAM-8810
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: Minor
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In several pipelines using streaming engine and thus the streaming commit 
> rpcs, work became stuck in state COMMITTING indefinitely.  Such stuckness 
> coincided with repeated streaming rpc failures.
> The status page shows that the key has work in state COMMITTING, and has 1 
> queued work item.
> There is a single active commit stream, with 0 pending requests.
> The stream could exist past the stream deadline because the StreamCache only 
> closes stream due to the deadline when a stream is retrieved, which only 
> occurs if there are other commits.  Since the pipeline is stuck due to this 
> event, there are no other commits.
> It seems therefore there is some race on the commitStream between onNewStream 
> and commitWork that either prevents work from being retried, an exception 
> that triggers between when the pending request is removed and the callback is 
> called, or some potential corruption of the activeWork data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to