Re: [PR] Improve Task Monitoring and Analysis (druid)

via GitHub Mon, 08 Apr 2024 15:32:18 -0700


suneet-s commented on code in PR #16041:
URL: https://github.com/apache/druid/pull/16041#discussion_r1556459555



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/IngestionStatsAndErrors.java:
##########
@@ -48,7 +49,8 @@ public IngestionStatsAndErrors(
       @JsonProperty("segmentAvailabilityWaitTimeMs") long 
segmentAvailabilityWaitTimeMs,
       @JsonProperty("recordsProcessed") Map<String, Long> recordsProcessed,
       @Nullable @JsonProperty("segmentsRead") Long segmentsRead,
-      @Nullable @JsonProperty("segmentsPublished") Long segmentsPublished
+      @Nullable @JsonProperty("segmentsPublished") Long segmentsPublished,
+      @Nullable @JsonProperty("taskContext") Map<String, Object> taskContext

Review Comment:
   Curious about this comment - Recently some new fields were added like 
segmentsPublished / segmentsRead, and previously 
segmentAvailabilityConfirmed/WaitTimeMs.
   
   What is the impact of adding new nullable fields to this class vs creating a 
new report with nullable fields for backwards compatibility?



##########
docs/operations/metrics.md:
##########
@@ -196,8 +196,9 @@ task's `IOConfig` as follows:
 |`false`|`false`|`REPLACE_LEGACY`. The default for JSON-based batch ingestion. 
|
 |`false`|`true`|`REPLACE`|
 
-The `tags` dimension is reported only for metrics emitted from ingestion tasks 
whose ingest spec specifies the `tags`
-field in the `context` field of the ingestion spec. `tags` is expected to be a 
map of string to object.
+The `tags` dimension enhances metrics, task reports, and the Peon 
service/heartbeat metric associated with ingestion tasks. These tags are 
derived from both the ingestion specification—when the `tags` field is 
specified within the task's context—and system-generated sources. The tags 
field within a task's `context` should be a map with string keys and object 
values, facilitating the inclusion of custom metadata. However, tags can also 
be automatically generated and added by the system, providing a comprehensive 
set of metadata for monitoring and analysis purposes.
+
+To further customize and enrich task metadata, developers can implement the 
`TaskContextEnricher` interface. By implementing custom logic within this 
interface, additional context fields can be introduced. This capability allows 
for enhanced observability and management of tasks, as these additional context 
fields can offer deeper insights into task execution, performance, and outcomes.

Review Comment:
   TaskContextEnricher is just an interface, and not an extension point, so the 
changes to this doc are not needed. We do not want to encourage users to use 
the TaskContextEnricher until it is marked as an ExtensionPoint.
   
   



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -697,6 +702,11 @@ private QueryDefinition initializeQueryDefAndState(final 
Closer closer)
         MSQControllerTask.isReplaceInputDataSourceTask(task)
     );
 
+    // propagate the controller's context and tags to the worker task
+    taskContextOverridesBuilder.put(MultiStageQueryContext.CTX_OF_CONTROLLER, 
task.getContext());
+    // specifically assign the 'tags' field for enhanced worker task metrics 
reporting
+    taskContextOverridesBuilder.put(DruidMetrics.TAGS, 
task.getContextValue(DruidMetrics.TAGS, new HashMap()));

Review Comment:
   This should only set the tsakContext for tags if tags already exists in the 
context maps, as the places where metrics are emitted use 
`metricBuilder.setDimensionIfNotNull(...` see IndexTaskUtils#setTaskDimensions



##########
docs/operations/metrics.md:
##########
@@ -196,8 +196,9 @@ task's `IOConfig` as follows:
 |`false`|`false`|`REPLACE_LEGACY`. The default for JSON-based batch ingestion. 
|
 |`false`|`true`|`REPLACE`|
 
-The `tags` dimension is reported only for metrics emitted from ingestion tasks 
whose ingest spec specifies the `tags`
-field in the `context` field of the ingestion spec. `tags` is expected to be a 
map of string to object.
+The `tags` dimension enhances metrics, task reports, and the Peon 
service/heartbeat metric associated with ingestion tasks. These tags are 
derived from both the ingestion specification—when the `tags` field is 
specified within the task's context—and system-generated sources. The tags 
field within a task's `context` should be a map with string keys and object 
values, facilitating the inclusion of custom metadata. However, tags can also 
be automatically generated and added by the system, providing a comprehensive 
set of metadata for monitoring and analysis purposes.
+
+To further customize and enrich task metadata, developers can implement the 
`TaskContextEnricher` interface. By implementing custom logic within this 
interface, additional context fields can be introduced. This capability allows 
for enhanced observability and management of tasks, as these additional context 
fields can offer deeper insights into task execution, performance, and outcomes.

Review Comment:
   I don't think the changes are needed to this part of the doc. Lower down in 
the doc, the dimensions that are expected with each metric are called out 
explicitly - since this patch just updates the service/heartbeat metric, can 
you update that [part of the 
doc](https://github.com/apache/druid/blob/master/docs/operations/metrics.md#service-health)
 with this new dimension.
   ```suggestion
   The `tags` dimension is reported only for metrics emitted from ingestion 
tasks whose ingest spec specifies the `tags`
   field in the `context` field of the ingestion spec. `tags` is expected to be 
a map of string to object.
   ```



##########
integration-tests/src/test/java/org/apache/druid/tests/indexer/AbstractITBatchIndexTest.java:
##########
@@ -370,7 +370,12 @@ protected void submitTaskAndWait(
     if (segmentAvailabilityConfirmationPair.lhs != null && 
segmentAvailabilityConfirmationPair.lhs) {
       TaskReport reportRaw = 
indexer.getTaskReport(taskID).get("ingestionStatsAndErrors");
       IngestionStatsAndErrorsTaskReport report = 
(IngestionStatsAndErrorsTaskReport) reportRaw;
-      IngestionStatsAndErrors reportData = (IngestionStatsAndErrors) 
report.getPayload();
+      IngestionStatsAndErrors reportData = report.getPayload();
+
+      Assert.assertTrue(
+          reportData.getTaskContext() != null && 
!reportData.getTaskContext().isEmpty(),
+          "Report data does not contain task context. Ensure that 
TaskContextEnricher is correctly bound."
+      );

Review Comment:
   Instead of just checking that the context is non empty - can we validate 
that certain fields we expect are there and contain the expected values?
   
   Similar changes to existing integration tests for streaming, compaction and 
MSQ tasks please. I also noticed there are 2 `AbstractITBatchIndexTest` 
classes, this one, and another one in the `druid-it-cases` sub module. Please 
make a similar change there so that we get good integration test coverage that 
the task context is being included in the report for all the different test 
cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Improve Task Monitoring and Analysis (druid)

Reply via email to