kfaraz commented on code in PR #18735:
URL: https://github.com/apache/druid/pull/18735#discussion_r2512883878
##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java:
##########
@@ -454,6 +455,14 @@ private void startPendingTaskOnRunner(TaskEntry entry,
ListenableFuture<TaskStat
if (taskIsReady) {
log.info("Asking taskRunner to run task[%s]", task.getId());
runnerTaskFuture = taskRunner.run(task);
+
+ // Emit the waiting time for the task
+ final ServiceMetricEvent.Builder metricBuilder = new
ServiceMetricEvent.Builder();
+ IndexTaskUtils.setTaskDimensions(metricBuilder, task);
+ emitter.emit(metricBuilder.setMetric(
+ "task/waiting/time",
+ new Duration(entry.getCreatedTime(),
DateTimes.nowUtc()).getMillis())
+ );
Review Comment:
Nit: Might be more readable with:
```suggestion
final long waitDurationMillis = new
Duration(entry.getCreatedTime(), DateTimes.nowUtc()).getMillis();
emitter.emit(metricBuilder.setMetric("task/waiting/time",
waitDurationMillis));
```
##########
docs/operations/metrics.md:
##########
@@ -302,7 +297,8 @@ If the JVM does not support CPU time measurement for the
current thread, `ingest
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
|`task/run/time`|Milliseconds taken to run a task.| `dataSource`, `taskId`,
`taskType`, `groupId`, `taskStatus`, `description`, `tags`|Varies|
-|`task/pending/time`|Milliseconds taken for a task to wait for running.|
`dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
+|`task/pending/time`|Milliseconds taken for a task to be scheduled and start
running.| `dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
Review Comment:
```suggestion
|`task/pending/time`|Milliseconds taken for a task to start running after
being scheduled by the Overlord.| `dataSource`, `taskId`, `taskType`,
`groupId`, `tags`|Varies|
```
##########
docs/operations/metrics.md:
##########
@@ -302,7 +297,8 @@ If the JVM does not support CPU time measurement for the
current thread, `ingest
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
|`task/run/time`|Milliseconds taken to run a task.| `dataSource`, `taskId`,
`taskType`, `groupId`, `taskStatus`, `description`, `tags`|Varies|
-|`task/pending/time`|Milliseconds taken for a task to wait for running.|
`dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
+|`task/pending/time`|Milliseconds taken for a task to be scheduled and start
running.| `dataSource`, `taskId`, `taskType`, `groupId`, `tags`|Varies|
+|`task/waiting/time`|Milliseconds taken for a task to wait (acquire locks,
etc.) before being scheduled to run.| `dataSource`, `taskId`, `taskType`,
`groupId`, `tags`|Varies|
Review Comment:
```suggestion
|`task/waiting/time`|Milliseconds taken for a task to be scheduled to run
after being submitted to the Overlord.| `dataSource`, `taskId`, `taskType`,
`groupId`, `tags`|Varies|
```
##########
indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskQueueTest.java:
##########
@@ -690,6 +690,52 @@ public TaskStatus runTask(TaskToolbox toolbox)
Assert.assertEquals(failedStatus,
getTaskStorage().getStatus(task.getId()).get());
}
+ @Test
+ public void testTaskWaitingTimeMetricNotEmittedWhenTaskNotReady() throws
Exception
+ {
+ // task1 acquires a lock that will block task2
+ final TestTask task1 = new TestTask("t1", Intervals.of("2021-01/P1M"));
+ prepareTaskForLocking(task1);
+ Assert.assertTrue(task1.isReady(actionClientFactory.create(task1)));
+
+ // task2 will not be ready because of task1's lock
+ final TestTask task2 = new TestTask("t2", Intervals.of("2021-01-31/P1M"));
+ taskQueue.add(task2);
+ taskQueue.manageQueuedTasks();
+
+ Thread.sleep(100);
Review Comment:
(not required in this PR) I think we should start using the
`LatchableEmitter` for these testing needs so that we can remove the sleeps
from the unit tests.
##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java:
##########
@@ -1163,13 +1172,15 @@ static class TaskEntry
{
private TaskInfo taskInfo;
+ private final DateTime createdTime;
private DateTime lastUpdatedTime;
private ListenableFuture<TaskStatus> future = null;
private boolean isComplete = false;
- TaskEntry(TaskInfo taskInfo)
+ TaskEntry(TaskInfo taskInfo, DateTime createdTime)
{
this.taskInfo = taskInfo;
+ this.createdTime = createdTime;
this.lastUpdatedTime = DateTimes.nowUtc();
Review Comment:
Q: Should the `lastUpdatedTime` also be assigned the `createdTime` in the
constructor?
I guess the difference between `createdTime` and `DateTimes.nowUtc()` is
that the former is the time when the entry was _supposed_ to be updated/created
and the latter is the time when it was _actually_ created.
The `lastUpdatedTime` is used to determine which of two competing updates
should win in a race condition. Assigning it to `createdTime` in the
constructor might ensure that the operation which actually came later is
honored.
What do you think, @jtuglu1 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]