scwhittle commented on code in PR #29082:
URL: https://github.com/apache/beam/pull/29082#discussion_r1484180613


##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/streaming/ActiveWorkState.java:
##########
@@ -129,13 +135,29 @@ synchronized ActivateWorkResult 
activateWorkForKey(ShardedKey shardedKey, Work w
       return ActivateWorkResult.EXECUTE;
     }
 
-    // Ensure we don't already have this work token queued.
+    // Check to see if we have this work token queued.
+    // This set is for adding remove-able WorkItems if they exist in the 
workQueue. We add them to
+    // this set since a ConcurrentModificationException will be thrown if we 
modify the workQueue
+    // and then resume iteration.
+    Set<WorkId> queuedWorkToRemove = new HashSet<>();
     for (Work queuedWork : workQueue) {
-      if (queuedWork.getWorkItem().getWorkToken() == 
work.getWorkItem().getWorkToken()) {
+      if (queuedWork.id().equals(work.id())) {
         return ActivateWorkResult.DUPLICATE;
       }
+      if (queuedWork.id().cacheToken() == work.id().cacheToken()) {
+        if (work.id().workToken() > queuedWork.id().workToken()) {
+          queuedWorkToRemove.add(queuedWork.id());
+          // Continue here to possibly remove more non-active stale work that 
is queued.
+        } else {
+          return ActivateWorkResult.STALE;
+        }
+      }
     }
 
+    workQueue.removeIf(

Review Comment:
   this isn't updating the activeworkbudget for the removals 
   see/share removal logic with the failing for heartbeats



##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/streaming/ActiveWorkState.java:
##########
@@ -129,13 +135,29 @@ synchronized ActivateWorkResult 
activateWorkForKey(ShardedKey shardedKey, Work w
       return ActivateWorkResult.EXECUTE;
     }
 
-    // Ensure we don't already have this work token queued.
+    // Check to see if we have this work token queued.
+    // This set is for adding remove-able WorkItems if they exist in the 
workQueue. We add them to
+    // this set since a ConcurrentModificationException will be thrown if we 
modify the workQueue
+    // and then resume iteration.
+    Set<WorkId> queuedWorkToRemove = new HashSet<>();
     for (Work queuedWork : workQueue) {
-      if (queuedWork.getWorkItem().getWorkToken() == 
work.getWorkItem().getWorkToken()) {
+      if (queuedWork.id().equals(work.id())) {
         return ActivateWorkResult.DUPLICATE;
       }
+      if (queuedWork.id().cacheToken() == work.id().cacheToken()) {
+        if (work.id().workToken() > queuedWork.id().workToken()) {
+          queuedWorkToRemove.add(queuedWork.id());
+          // Continue here to possibly remove more non-active stale work that 
is queued.
+        } else {
+          return ActivateWorkResult.STALE;
+        }
+      }
     }
 
+    workQueue.removeIf(
+        queuedWork ->
+            queuedWorkToRemove.contains(queuedWork.id()) && 
!queuedWork.equals(workQueue.peek()));
+

Review Comment:
   I can't add a comment on the right line because github is stupid, but can 
you replace FailedTokens below with the new WorkId class?



##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/streaming/ActiveWorkState.java:
##########
@@ -129,13 +135,29 @@ synchronized ActivateWorkResult 
activateWorkForKey(ShardedKey shardedKey, Work w
       return ActivateWorkResult.EXECUTE;
     }
 
-    // Ensure we don't already have this work token queued.
+    // Check to see if we have this work token queued.
+    // This set is for adding remove-able WorkItems if they exist in the 
workQueue. We add them to
+    // this set since a ConcurrentModificationException will be thrown if we 
modify the workQueue
+    // and then resume iteration.
+    Set<WorkId> queuedWorkToRemove = new HashSet<>();

Review Comment:
   instead of a set and then removeif, how about doing in a single pass with an 
iterator and Iterator.remove()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to