zachjsh commented on code in PR #16719:
URL: https://github.com/apache/druid/pull/16719#discussion_r1672828810


##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -163,30 +190,44 @@ private DruidCoordinatorRuntimeParams runInternal(final 
DruidCoordinatorRuntimeP
    * Spawn kill tasks for each datasource in {@code dataSourcesToKill} upto 
{@code availableKillTaskSlots}.
    */
   private void killUnusedSegments(
-      @Nullable final Collection<String> dataSourcesToKill,
+      final Set<String> dataSourcesToKill,
       final int availableKillTaskSlots,
       final CoordinatorRunStats stats
   )
   {
-    if (CollectionUtils.isNullOrEmpty(dataSourcesToKill) || 
availableKillTaskSlots <= 0) {
+    if (CollectionUtils.isNullOrEmpty(dataSourcesToKill)) {
+      log.info("Skipping KillUnusedSegments because there are no datasources 
to kill.");
       stats.add(Stats.Kill.SUBMITTED_TASKS, 0);
       return;
     }
+    final Iterator<String> dataSourcesToKillIterator = 
this.datasourceIterator.getIterator();
+    final Set<String> remainingDatasourcesToKill = new 
HashSet<>(dataSourcesToKill);
+    final Set<String> datasourcesKilled = new HashSet<>();
 
-    final Collection<String> remainingDatasourcesToKill = new 
ArrayList<>(dataSourcesToKill);
     int submittedTasks = 0;
-    for (String dataSource : dataSourcesToKill) {
+    while (dataSourcesToKillIterator.hasNext()) {
+      if (remainingDatasourcesToKill.size() == 0) {
+        log.info(
+            "Submitted [%d] kill tasks for [%d] datasources. No more 
datasource to kill in this cycle.",
+            submittedTasks, datasourcesKilled.size()
+        );
+        break;
+      }
+
       if (submittedTasks >= availableKillTaskSlots) {
         log.info(
-            "Submitted [%d] kill tasks and reached kill task slot limit [%d].",
-            submittedTasks, availableKillTaskSlots
+            "Submitted [%d] kill tasks for [%d] datasources and reached kill 
task slot limit [%d].",
+            submittedTasks, datasourcesKilled.size(), availableKillTaskSlots
         );
         break;
       }
+
+      final String dataSource = dataSourcesToKillIterator.next();
       final DateTime maxUsedStatusLastUpdatedTime = 
DateTimes.nowUtc().minus(bufferPeriod);
       final Interval intervalToKill = findIntervalForKill(dataSource, 
maxUsedStatusLastUpdatedTime, stats);
       if (intervalToKill == null) {
         datasourceToLastKillIntervalEnd.remove(dataSource);
+        remainingDatasourcesToKill.remove(dataSource);

Review Comment:
   idea for improvement in this area, I think this issue existed before; if 
there is only one datasource with segments to kill, and it has more unused 
segments than we limit per task, and there is more than one task slot allotted 
to auto kill, then we only issue one task per run cycle, when we can fully use 
the capacity we have. Not to fix in this pr, since I believe it was an existing 
issue, just an idea for improvement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to