gianm commented on code in PR #18121:
URL: https://github.com/apache/druid/pull/18121#discussion_r2152959549


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/dart/controller/DartControllerContext.java:
##########
@@ -108,8 +113,12 @@ public DartControllerContext(
     this.serverView = serverView;
     this.memoryIntrospector = memoryIntrospector;
     this.context = context;
-    this.metricBuilder = new ServiceMetricEvent.Builder();
     this.emitter = emitter;
+
+    // Set up metric dimensions
+    this.metricBuilder = new ServiceMetricEvent.Builder();
+    MSQMetricUtils.setDartQueryIdDimensions(this.metricBuilder, context);
+    this.metricBuilder.setDimension("engine", DartSqlEngine.NAME);

Review Comment:
   This line is redundant, I think, since it's already being done in 
`MSQMetricUtils.setDartQueryIdDimensions`.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -553,11 +555,17 @@ private MSQTaskReportPayload runInternal(final 
QueryListener queryListener, fina
         countersSnapshot,
         null
     );
+    emitQueryMetrics(stopwatch);
     // Emit summary metrics
     emitSummaryMetrics(msqTaskReportPayload, querySpec);
     return msqTaskReportPayload;
   }
 
+  private void emitQueryMetrics(Stopwatch stopwatch)
+  {
+    context.emitMetric("query/time", stopwatch.millisElapsed());

Review Comment:
   A note on this: there is a class `QueryMetrics` that native queries use to 
report metrics such as `query/time`. It has standard handling for each of these 
fields. There are also some query-type-specific fields added by subclasses such 
as `DefaultGroupByQueryMetrics`.
   
   At this time I think we should *not* use `QueryMetrics` here to emit 
metrics, because it's very tied to the native execution stack: it uses `Query` 
objects directly and it assumes data servers are only handling a single query 
type at a time. And I don't think now is the right time to be making big 
changes to `QueryMetrics`. But there may be some opportunities to share code, 
such as by breaking out the implementation of computation of specific 
dimensions (such as `intervals` and `duration`, for example) into helper 
functions that both can use. We may also revisit harmonizing this more in the 
future.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java:
##########
@@ -349,6 +361,8 @@ && handleResultsReady(kernelHolder, controllerClient)) {
       }
     }
 
+    context.emitMetric("query/time", stopwatch.millisElapsed());

Review Comment:
   This one should also have the same metric dimensions as the controller 
`query/time`.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -553,11 +555,17 @@ private MSQTaskReportPayload runInternal(final 
QueryListener queryListener, fina
         countersSnapshot,
         null
     );
+    emitQueryMetrics(stopwatch);
     // Emit summary metrics
     emitSummaryMetrics(msqTaskReportPayload, querySpec);
     return msqTaskReportPayload;
   }
 
+  private void emitQueryMetrics(Stopwatch stopwatch)
+  {
+    context.emitMetric("query/time", stopwatch.millisElapsed());

Review Comment:
   This should include the major query metric dimensions. Thoughts about what 
we should do for each of the current ones, in this patch:
   
   - **`dataSource`** - Take all the datasource names from all `TableInputSpec` 
involved in the query, and report them in the same style as the native engine 
would report them. Let's ignore other `InputSpec` types for now
   - **`type`** - No analog of this in MSQ, so let's just have this say `msq`
   - **`interval`** - Take all the intervals from all `TableInputSpec` involved 
in the query. Let's ignore other `InputSpec` types for now
   - **`hasFilters`** - Do not include
   - **`duration`** - Compute this the same way that native metrics would work: 
add up the duration of all the `intervals` in the manner of 
`BaseQuery#getDuration`
   - **`queryId`** - Looks like this is already included
   - **`subQueryId`** - Do not include
   - **`sqlQueryId`** - Looks like this is already included
   - **`context`** - Do not include
   - **`success`** - `true` for successful queries, `false` for failures



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java:
##########
@@ -220,9 +221,19 @@ public void run()
     }
     finally {
       runFuture.set(null);
+      reportCpuMetrics();
     }
   }
 
+  private void reportCpuMetrics()
+  {
+    long cpuTimeNs = 0L;
+    for (final CounterTracker tracker : stageCounters.values()) {
+      cpuTimeNs += tracker.totalCpu();
+    }
+    context.emitMetric("query/cpu/time", 
TimeUnit.NANOSECONDS.toMicros(cpuTimeNs));

Review Comment:
   This one should also have all the same metric dimensions that `query/time` 
has.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/dart/worker/DartWorkerContext.java:
##########
@@ -120,6 +127,12 @@ public class DartWorkerContext implements WorkerContext
     this.outbox = outbox;
     this.tempDir = tempDir;
     this.queryContext = Preconditions.checkNotNull(queryContext, 
"queryContext");
+    this.emitter = emitter;
+
+    // Set up metric dimensions
+    this.metricBuilder = new ServiceMetricEvent.Builder();
+    MSQMetricUtils.setDartQueryIdDimensions(this.metricBuilder, queryContext);
+    this.metricBuilder.setDimension("engine", DartSqlEngine.NAME);

Review Comment:
   Redundant, I think.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -376,6 +377,7 @@ private MSQTaskReportPayload runInternal(final 
QueryListener queryListener, fina
 
     final TaskState taskStateForReport;
     final MSQErrorReport errorForReport;
+    final Stopwatch stopwatch = Stopwatch.createStarted();

Review Comment:
   This should start counting from the moment the query is accepted, which 
would align with how the native query metric works. You should use the same 
start time that is used for calculating timeouts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to