jihoonson commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r493935973



##########
File path: 
server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsMonitor.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.inject.Inject;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+public class TaskSlotCountStatsMonitor extends AbstractMonitor
+{
+  private final TaskSlotCountStatsProvider statsProvider;
+
+  @Inject
+  public TaskSlotCountStatsMonitor(
+          TaskSlotCountStatsProvider statsProvider
+  )
+  {
+    this.statsProvider = statsProvider;
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter)
+  {
+    emit(emitter, "taskSlot/total/count", 
statsProvider.getTotalTaskSlotCount());

Review comment:
       I think this monitor should emit metrics only when `TaskMaster` is a 
leader. Check out what `TaskCountStatsMonitor` does. Probably new interfaces in 
`TaskSlotCountStatsProvider` should be able to return null when `TaskMaster` is 
not a leader.

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java
##########
@@ -1342,6 +1343,13 @@ public TaskLocation getTaskLocation(String taskId)
     ).collect(Collectors.toList());
   }
 
+  public Collection<ImmutableWorkerInfo> getBlackListedWorkers()
+  {
+    synchronized (blackListedWorkers) {

Review comment:
       It seems OK without this synchronization.

##########
File path: 
server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsProvider.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+public interface TaskSlotCountStatsProvider
+{
+  /**
+   * Return the number of total task slots during emission period.
+   */
+  long getTotalTaskSlotCount();
+
+  /**
+   * Return the number of idle task slots during emission period.
+   */
+  long getIdleTaskSlotCount();
+
+  /**
+   * Return the number of used task slots during emission period.
+   */
+  long getUsedTaskSlotCount();
+
+  /**
+   * Return the number of lazy task slots during emission period.
+   */
+  long getLazyTaskSlotCount();
+
+  /**
+   * Return the number of blacklisted task slots during emission period.
+   */
+  long getBlacklistedTaskSlotCount();

Review comment:
       Please clarify the Javadoc of these methods too. It should be the number 
of task slots of lazy/blacklisted middleManagers and indexers.

##########
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
##########
@@ -867,6 +857,18 @@ private boolean tryAssignTask(final Task task, final 
RemoteTaskRunnerWorkItem ta
     }
   }
 
+  Map<String, ImmutableWorkerInfo> getWorkersEligibleToRunTasks()

Review comment:
       Even though this method was always called under a lock on 
`workerWithUnacknowledgedTask` before, this change seems OK because it will 
still be called under the same lock in `tryAssignTask`, and the new caller 
`getIdleTaskSlotCount` doesn't seem to require locking (as it doesn't update 
the `workerWithUnacknowledgedTask` map.

##########
File path: docs/operations/metrics.md
##########
@@ -186,6 +186,11 @@ Note: If the JVM does not support CPU time measurement for 
the current thread, i
 |`task/running/count`|Number of current running tasks. This metric is only 
available if the TaskCountStatsMonitor module is included.|dataSource.|Varies.|
 |`task/pending/count`|Number of current pending tasks. This metric is only 
available if the TaskCountStatsMonitor module is included.|dataSource.|Varies.|
 |`task/waiting/count`|Number of current waiting tasks. This metric is only 
available if the TaskCountStatsMonitor module is included.|dataSource.|Varies.|
+|`taskSlot/total/count`|Number of total task slots per emission period. This 
metric is only available if the TaskSlotCountStatsMonitor module is included.| 
|Varies.|
+|`taskSlot/idle/count`|Number of idle task slots per emission period. This 
metric is only available if the TaskSlotCountStatsMonitor module is included.| 
|Varies.|
+|`taskSlot/used/count`|Number of busy task slots per emission period. This 
metric is only available if the TaskSlotCountStatsMonitor module is included.| 
|Varies.|
+|`taskSlot/lazy/count`|Number of lazy task slots per emission period. This 
metric is only available if the TaskSlotCountStatsMonitor module is included.| 
|Varies.|
+|`taskSlot/blacklisted/count`|Number of blacklisted task slots per emission 
period. This metric is only available if the TaskSlotCountStatsMonitor module 
is included.| |Varies.|

Review comment:
       I left a same comment in 
https://github.com/apache/druid/issues/10378#issuecomment-698005593. Please 
clarify the description on these metrics.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to