[GitHub] [druid] jihoonson commented on a change in pull request #10479: Fix compaction task slot computation in auto compaction

GitBox Mon, 05 Oct 2020 15:29:13 -0700


jihoonson commented on a change in pull request #10479:
URL: https://github.com/apache/druid/pull/10479#discussion_r499906319




##########
File path: 
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -277,7 +300,8 @@ private CoordinatorStats doRun(
         );
         LOG.infoSegments(segmentsToCompact, "Compacting segments");
         // Count the compaction task itself + its sub tasks
-        numSubmittedTasks += 
findNumMaxConcurrentSubTasks(config.getTuningConfig()) + 1;
+        numSubmittedTasks++;
+        numCompactionTasksAndSubtasks += 
findMaxNumTaskSlotsUsedByOneCompactionTask(config.getTuningConfig()) + 1;

Review comment:
       Oops, good catch. It was always counting an extra slot regardless of the 
mode of the compaction task. Added tests.

##########
File path: 
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -277,7 +300,8 @@ private CoordinatorStats doRun(
         );
         LOG.infoSegments(segmentsToCompact, "Compacting segments");
         // Count the compaction task itself + its sub tasks
-        numSubmittedTasks += 
findNumMaxConcurrentSubTasks(config.getTuningConfig()) + 1;
+        numSubmittedTasks++;

Review comment:
       I think the previous one was wrong. It should count only the number of 
compaction tasks, not including its subtasks. I think this change would be fine 
since it's not documented yet. But yes, we should document it and all missing 
metrics.

##########
File path: 
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -180,24 +183,43 @@ public DruidCoordinatorRuntimeParams 
run(DruidCoordinatorRuntimeParams params)
   }
 
   /**
-   * Each compaction task can run a parallel indexing task. When we count the 
number of current running
-   * compaction tasks, we should count the sub tasks of the parallel indexing 
task as well. However, we currently
-   * don't have a good way to get the number of current running sub tasks 
except poking each supervisor task,
-   * which is complex to handle all kinds of failures. Here, we simply return 
{@code maxNumConcurrentSubTasks} instead
-   * to estimate the number of sub tasks conservatively. This should be ok 
since it won't affect to the performance of
-   * other ingestion types.
+   * Returns the maximum number of task slots used by one compaction task at 
any time when the task is issued with
+   * the given tuningConfig.
    */
-  private int findNumMaxConcurrentSubTasks(@Nullable 
ClientCompactionTaskQueryTuningConfig tuningConfig)
+  @VisibleForTesting
+  static int findMaxNumTaskSlotsUsedByOneCompactionTask(@Nullable 
ClientCompactionTaskQueryTuningConfig tuningConfig)
   {
-    if (tuningConfig != null && tuningConfig.getMaxNumConcurrentSubTasks() != 
null) {
-      // The actual number of subtasks might be smaller than the configured 
max.
-      // However, we use the max to simplify the estimation here.
-      return tuningConfig.getMaxNumConcurrentSubTasks();
+    if (isParallelMode(tuningConfig)) {
+      @Nullable Integer maxNumConcurrentSubTasks = 
tuningConfig.getMaxNumConcurrentSubTasks();
+      // Max number of task slots used in parallel mode = 
maxNumConcurrentSubTasks + 1 (supervisor task)
+      return (maxNumConcurrentSubTasks == null ? 1 : maxNumConcurrentSubTasks) 
+ 1;
     } else {
-      return 0;
+      return 1;
     }
   }
 
+  /**
+   * Returns true if the compaction task can run in the parallel mode with the 
given tuningConfig.
+   * This method should be synchronized with 
ParallelIndexSupervisorTask.isParallelMode(InputSource, 
ParallelIndexTuningConfig).
+   */
+  @VisibleForTesting
+  static boolean isParallelMode(@Nullable 
ClientCompactionTaskQueryTuningConfig tuningConfig)

Review comment:
       They can't be merged for now since they are in different modules. There 
will be a cyclic dependency if I add a dependency for `indexing-service` in 
`server` since `indexing-service` already has one for `server`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a change in pull request #10479: Fix compaction task slot computation in auto compaction

Reply via email to