kfaraz commented on code in PR #18735:
URL: https://github.com/apache/druid/pull/18735#discussion_r2512883878


##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java:
##########
@@ -454,6 +455,14 @@ private void startPendingTaskOnRunner(TaskEntry entry, 
ListenableFuture<TaskStat
           if (taskIsReady) {
             log.info("Asking taskRunner to run task[%s]", task.getId());
             runnerTaskFuture = taskRunner.run(task);
+
+            // Emit the waiting time for the task
+            final ServiceMetricEvent.Builder metricBuilder = new 
ServiceMetricEvent.Builder();
+            IndexTaskUtils.setTaskDimensions(metricBuilder, task);
+            emitter.emit(metricBuilder.setMetric(
+                "task/waiting/time",
+                new Duration(entry.getCreatedTime(), 
DateTimes.nowUtc()).getMillis())
+            );

Review Comment:
   Nit: Might be more readable with:
   
   
   ```suggestion
               final long waitDurationMillis = new 
Duration(entry.getCreatedTime(), DateTimes.nowUtc()).getMillis();
               emitter.emit(metricBuilder.setMetric("task/waiting/time", 
waitDurationMillis));
   ```



##########
docs/operations/metrics.md:
##########
@@ -302,7 +297,8 @@ If the JVM does not support CPU time measurement for the 
current thread, `ingest
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
 |`task/run/time`|Milliseconds taken to run a task.| `dataSource`, `taskId`, 
`taskType`, `groupId`, `taskStatus`, `description`, `tags`|Varies|
-|`task/pending/time`|Milliseconds taken for a task to wait for running.| 
`dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
+|`task/pending/time`|Milliseconds taken for a task to be scheduled and start 
running.| `dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|

Review Comment:
   ```suggestion
   |`task/pending/time`|Milliseconds taken for a task to start running after 
being scheduled by the Overlord.| `dataSource`, `taskId`, `taskType`, 
`groupId`, `tags`|Varies|
   ```



##########
docs/operations/metrics.md:
##########
@@ -302,7 +297,8 @@ If the JVM does not support CPU time measurement for the 
current thread, `ingest
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
 |`task/run/time`|Milliseconds taken to run a task.| `dataSource`, `taskId`, 
`taskType`, `groupId`, `taskStatus`, `description`, `tags`|Varies|
-|`task/pending/time`|Milliseconds taken for a task to wait for running.| 
`dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
+|`task/pending/time`|Milliseconds taken for a task to be scheduled and start 
running.| `dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
+|`task/waiting/time`|Milliseconds taken for a task to wait (acquire locks, 
etc.) before being scheduled to run.| `dataSource`, `taskId`, `taskType`, 
`groupId`, `tags`|Varies|

Review Comment:
   ```suggestion
   |`task/waiting/time`|Milliseconds taken for a task to be scheduled to run 
after being submitted to the Overlord.| `dataSource`, `taskId`, `taskType`, 
`groupId`, `tags`|Varies|
   ```



##########
indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskQueueTest.java:
##########
@@ -690,6 +690,52 @@ public TaskStatus runTask(TaskToolbox toolbox)
     Assert.assertEquals(failedStatus, 
getTaskStorage().getStatus(task.getId()).get());
   }
 
+  @Test
+  public void testTaskWaitingTimeMetricNotEmittedWhenTaskNotReady() throws 
Exception
+  {
+    // task1 acquires a lock that will block task2
+    final TestTask task1 = new TestTask("t1", Intervals.of("2021-01/P1M"));
+    prepareTaskForLocking(task1);
+    Assert.assertTrue(task1.isReady(actionClientFactory.create(task1)));
+
+    // task2 will not be ready because of task1's lock
+    final TestTask task2 = new TestTask("t2", Intervals.of("2021-01-31/P1M"));
+    taskQueue.add(task2);
+    taskQueue.manageQueuedTasks();
+
+    Thread.sleep(100);

Review Comment:
   (not required in this PR) I think we should start using the 
`LatchableEmitter` for these testing needs so that we can remove the sleeps 
from the unit tests.



##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java:
##########
@@ -1163,13 +1172,15 @@ static class TaskEntry
   {
     private TaskInfo taskInfo;
 
+    private final DateTime createdTime;
     private DateTime lastUpdatedTime;
     private ListenableFuture<TaskStatus> future = null;
     private boolean isComplete = false;
 
-    TaskEntry(TaskInfo taskInfo)
+    TaskEntry(TaskInfo taskInfo, DateTime createdTime)
     {
       this.taskInfo = taskInfo;
+      this.createdTime = createdTime;
       this.lastUpdatedTime = DateTimes.nowUtc();

Review Comment:
   Q: Should the `lastUpdatedTime` also be assigned the `createdTime` in the 
constructor?
   
   I guess the difference between `createdTime` and `DateTimes.nowUtc()` is 
that the former is the time when the entry was _supposed_ to be updated/created 
and the latter is the time when it was _actually_ created.
   
   The `lastUpdatedTime` is used to determine which of two competing updates 
should win in a race condition. Assigning it to `createdTime` in the 
constructor might ensure that the operation which actually came later is 
honored.
   
   What do you think, @jtuglu1 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to