Re: [PR] Allow for task limit on kill tasks spawned by auto kill coordinator duty (druid)

via GitHub Mon, 07 Aug 2023 13:31:32 -0700


jasonk000 commented on code in PR #14769:
URL: https://github.com/apache/druid/pull/14769#discussion_r1286347213



##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -149,7 +173,7 @@ private void killUnusedSegments(Collection<String> 
dataSourcesToKill)
       }
     }
 
-    log.debug("Submitted kill tasks for [%d] datasources.", submittedTasks);
+    log.debug("Submitted [%d] kill tasks for [%d] datasources.", 
submittedTasks, dataSourcesToKill.size());

Review Comment:
   if any datasources were skipped due to not enough slots, we could consider 
logging them here?



##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -174,4 +198,70 @@ private Interval findIntervalForKill(String dataSource)
     }
   }
 
+  private int getAvailableKillTaskSlots(double killTaskSlotRatio, int 
maxKillTaskSlots)
+  {
+    return Math.max(0, getKillTaskCapacity(killTaskSlotRatio, 
maxKillTaskSlots) - getNumActiveKillTaskSlots());
+  }
+
+  private int getNumActiveKillTaskSlots()
+  {
+    final CloseableIterator<TaskStatusPlus> activeTasks =
+        FutureUtils.getUnchecked(overlordClient.taskStatuses(null, null, 0), 
true);
+    // Fetch currently running kill tasks
+    int numActiveKillTasks = 0;
+
+    try (final Closer closer = Closer.create()) {
+      closer.register(activeTasks);
+      while (activeTasks.hasNext()) {
+        final TaskStatusPlus status = activeTasks.next();
+
+        // taskType can be null if middleManagers are running with an older 
version. Here, we consevatively regard

Review Comment:
   I think this is important enough to capture as a method-level javadoc hint. 
ie: this might overestimate in some circumstances.



##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -169,9 +178,17 @@ public void testDurationToRetain()
         yearOldSegment.getInterval().getStart(),
         dayOldSegment.getInterval().getEnd()
     );
+    mockAvailableTaskSlotsForKill(1.0, Integer.MAX_VALUE);
     runAndVerifyKillInterval(expectedKillInterval);
   }
 
+  @Test
+  public void testNoAvailableTaskCapacityForKill()
+  {
+    mockNoAvailableTaskSlotsForKill();
+    runAndVerifyNoKill();

Review Comment:
   :+1: 



##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -174,4 +198,70 @@ private Interval findIntervalForKill(String dataSource)
     }
   }
 
+  private int getAvailableKillTaskSlots(double killTaskSlotRatio, int 
maxKillTaskSlots)
+  {
+    return Math.max(0, getKillTaskCapacity(killTaskSlotRatio, 
maxKillTaskSlots) - getNumActiveKillTaskSlots());
+  }
+
+  private int getNumActiveKillTaskSlots()
+  {
+    final CloseableIterator<TaskStatusPlus> activeTasks =
+        FutureUtils.getUnchecked(overlordClient.taskStatuses(null, null, 0), 
true);
+    // Fetch currently running kill tasks
+    int numActiveKillTasks = 0;
+
+    try (final Closer closer = Closer.create()) {
+      closer.register(activeTasks);
+      while (activeTasks.hasNext()) {
+        final TaskStatusPlus status = activeTasks.next();
+
+        // taskType can be null if middleManagers are running with an older 
version. Here, we consevatively regard
+        // the tasks of the unknown taskType as the killTask. This is because 
it's important to not run
+        // killTasks more than the configured limit at any time which might 
impact to the ingestion
+        // performance.
+        if (status.getType() == null
+            || (KILL_TASK_TYPE.equals(status.getType()) && 
status.getId().startsWith(TASK_ID_PREFIX))) {
+          numActiveKillTasks++;
+        }
+      }
+    }
+    catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+
+    return numActiveKillTasks;
+  }
+
+  private int getKillTaskCapacity(double killTaskSlotRatio, int 
maxKillTaskSlots)

Review Comment:
   Personally, I'd split this to a `getTotalWorkerCapacity`, to keep the kill 
calculation separate. I think testing would be cleaner this way too
   
   It would be great to see a unit test that explicitly could test 
`getKillTaskCapacity` based only on numeric inputs, so that you can test 0s, 
infinities, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Allow for task limit on kill tasks spawned by auto kill coordinator duty (druid)

Reply via email to