This is an automated email from the ASF dual-hosted git repository.

tuglu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
     new a9106f165e5 Handle QueryException edge case in status code metric 
emission (#18633)
a9106f165e5 is described below

commit a9106f165e51d8805247011b93f3fba4ae9501bf
Author: jtuglu1 <[email protected]>
AuthorDate: Tue Oct 14 23:40:49 2025 -0700

    Handle QueryException edge case in status code metric emission (#18633)
    
    Builds on 
https://github.com/apache/druid/commit/4b624b2cb47071d20dc4d0fd03bc027551f56b92 
to cover edge cases in Broker processing where no `DruidException` wrapper is 
used (and `QueryException` is thrown directly). This allows proper error code 
classification for this set of cases rather than the default 500. There are 
currently no other custom Druid exception types that map cleanly to status 
codes, so handling these 2 classes should cover all cases.
---
 docs/operations/metrics.md                         | 12 ++--
 .../java/org/apache/druid/query/DruidMetrics.java  |  6 +-
 .../org/apache/druid/query/DruidMetricsTest.java   | 65 ++++++++++++++++++++++
 3 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/docs/operations/metrics.md b/docs/operations/metrics.md
index f5fdcecccd3..ce488676a0f 100644
--- a/docs/operations/metrics.md
+++ b/docs/operations/metrics.md
@@ -45,13 +45,13 @@ Most metric values reset each emission period, as specified 
in `druid.monitoring
 
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|Native Query: 
`dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`, 
`remoteAddress`, `id`, `code`.|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|Native Query: 
`dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`, 
`remoteAddress`, `id`, `statusCode`.|< 1s|
 
 ### Broker
 
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, 
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, 
`id`.</p><p>Aggregation Queries: `numMetrics`, 
`numComplexMetrics`.</p><p>GroupBy: `numDimensions`.</p><p> TopN: `threshold`, 
`dimension`.</p>|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, 
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`, 
`statusCode`.</p><p>Aggregation Queries: `numMetrics`, 
`numComplexMetrics`.</p><p>GroupBy: `numDimensions`.</p><p> TopN: `threshold`, 
`dimension`.</p>|< 1s|
 |`query/bytes`|The total number of bytes returned to the requesting client in 
the query response from the broker. Other services report the total bytes for 
their portion of the query. |<p>Common: `dataSource`, `type`, `interval`, 
`hasFilters`, `duration`, `context`, `remoteAddress`, `id`.</p><p> Aggregation 
Queries: `numMetrics`, `numComplexMetrics`.</p><p> GroupBy: 
`numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>| |
 |`query/node/time`|Milliseconds taken to query individual historical/realtime 
processes.|`id`, `status`, `server`|< 1s|
 |`query/resultCache/hit`|Whether the query hit the result cache (1) or not 
(0). Emission of the metric indicates the result-level cache was 
polled.|<p>Common: `dataSource`, `type`, `interval`, `hasFilters`, `duration`, 
`context`, `remoteAddress`, `id`.</p>|Varies|
@@ -64,7 +64,7 @@ Most metric values reset each emission period, as specified 
in `druid.monitoring
 |`query/timeout/count`|Number of timed out queries.|This metric is only 
available if the `QueryCountStatsMonitor` module is included.| |
 |`query/segments/count`|This metric is not enabled by default. See the 
`QueryMetrics` Interface for reference regarding enabling this metric. Number 
of segments that will be touched by the query. In the broker, it makes a plan 
to distribute the query to realtime tasks and historicals based on a snapshot 
of segment distribution state. If there are some segments moved after this 
snapshot is created, certain historicals and realtime tasks can report those 
segments as missing to the broker.  [...]
 |`query/priority`|Assigned lane and priority, only if Laning strategy is 
enabled. Refer to [Laning 
strategies](../configuration/index.md#laning-strategies)|`lane`, `dataSource`, 
`type`|0|
-|`sqlQuery/time`|Milliseconds taken to complete a SQL query.|`id`, 
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`, `code`|< 
1s|
+|`sqlQuery/time`|Milliseconds taken to complete a SQL query.|`id`, 
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`, 
`statusCode`|< 1s|
 |`sqlQuery/planningTimeMs`|Milliseconds taken to plan a SQL to native 
query.|`id`, `nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, 
`engine`| |
 |`sqlQuery/bytes`|Number of bytes returned in the SQL query response.|`id`, 
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`| |
 |`serverview/init/time`|Time taken to initialize the broker server view. 
Useful to detect if brokers are taking too long to start.||Depends on the 
number of segments.|
@@ -97,7 +97,7 @@ Most metric values reset each emission period, as specified 
in `druid.monitoring
 
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, 
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, 
`id`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p> 
GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, 
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`, 
`statusCode`.</p><p> Aggregation Queries: `numMetrics`, 
`numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`, 
`dimension`.</p>|< 1s|
 |`query/segment/time`|Milliseconds taken to query individual segment. Includes 
time to page in the segment from disk.|`id`, `status`, `segment`, 
`vectorized`.|several hundred milliseconds|
 |`query/wait/time`|Milliseconds spent waiting for a segment to be 
scanned.|`id`, `segment`|< several hundred milliseconds|
 |`segment/scan/pending`|Number of segments in queue waiting to be 
scanned.||Close to 0|
@@ -121,7 +121,7 @@ Most metric values reset each emission period, as specified 
in `druid.monitoring
 
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, 
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, 
`id`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p> 
GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, 
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`, 
`statusCode`.</p><p> Aggregation Queries: `numMetrics`, 
`numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`, 
`dimension`.</p>|< 1s|
 |`query/wait/time`|Milliseconds spent waiting for a segment to be 
scanned.|`id`, `segment`|several hundred milliseconds|
 |`segment/scan/pending`|Number of segments in queue waiting to be 
scanned.||Close to 0|
 |`segment/scan/active`|Number of segments currently scanned. This metric also 
indicates how many threads from `druid.processing.numThreads` are currently 
being used.||Close to `druid.processing.numThreads`|
@@ -186,7 +186,7 @@ If SQL is enabled, the Broker will emit the following 
metrics for SQL.
 
 |Metric|Description|Dimensions|Normal value|
 |------|-----------|----------|------------|
-|`sqlQuery/time`|Milliseconds taken to complete a SQL.|`id`, `nativeQueryIds`, 
`dataSource`, `remoteAddress`, `success`, `engine`|< 1s|
+|`sqlQuery/time`|Milliseconds taken to complete a SQL.|`id`, `nativeQueryIds`, 
`dataSource`, `remoteAddress`, `success`, `engine`, `statusCode`|< 1s|
 |`sqlQuery/planningTimeMs`|Milliseconds taken to plan a SQL to native 
query.|`id`, `nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, 
`engine`| |
 |`sqlQuery/bytes`|number of bytes returned in SQL response.|`id`, 
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`| |
 
diff --git a/processing/src/main/java/org/apache/druid/query/DruidMetrics.java 
b/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
index 61c4733a64e..08a8a6e9396 100644
--- a/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
+++ b/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
@@ -37,7 +37,7 @@ public class DruidMetrics
   public static final String INTERVAL = "interval";
   public static final String ID = "id";
   public static final String SUBQUERY_ID = "subQueryId";
-  public static final String CODE = "code";
+  public static final String CODE = "statusCode";
   public static final String STATUS = "status";
   public static final String ENGINE = "engine";
   public static final String DURATION = "duration";
@@ -95,7 +95,7 @@ public class DruidMetrics
    * Computes the HTTP status code based on the query error (if any) for 
tagged metric emission.
    * <ul>
    *   <li>If error is null: returns 200 (success)</li>
-   *   <li>If error is a DruidException: returns the category's expected HTTP 
status</li>
+   *   <li>If error is a {@link DruidException} or {@link QueryException}: 
returns the corresponding status code</li>
    *   <li>Otherwise (unclassified error): returns 500 (internal server 
error)</li>
    * </ul>
    *
@@ -109,6 +109,8 @@ public class DruidMetrics
     }
     if (error instanceof DruidException) {
       return ((DruidException) error).getCategory().getExpectedStatus();
+    } else if (error instanceof QueryException) {
+      return ((QueryException) error).getFailType().getExpectedStatus();
     }
     // Unclassified errors default to 500 (defensive)
     return DruidException.Category.DEFENSIVE.getExpectedStatus();
diff --git 
a/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java 
b/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
index 0ee2ec9beb6..5ab96f86351 100644
--- a/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
+++ b/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
@@ -44,5 +44,70 @@ public class DruidMetricsTest
       );
     }
   }
+
+  @Test
+  public void testComputeStatusCode_queryExceptionCategories()
+  {
+    Assert.assertEquals(
+        500,
+        DruidMetrics.computeStatusCode(new QueryException(
+            null,
+            QueryException.QUERY_CANCELED_ERROR_CODE,
+            null,
+            null,
+            null
+        ))
+    );
+    Assert.assertEquals(
+        504,
+        DruidMetrics.computeStatusCode(new QueryException(
+            null,
+            QueryException.QUERY_TIMEOUT_ERROR_CODE,
+            null,
+            null,
+            null
+        ))
+    );
+    Assert.assertEquals(
+        429,
+        DruidMetrics.computeStatusCode(new QueryException(
+            null,
+            QueryException.QUERY_CAPACITY_EXCEEDED_ERROR_CODE,
+            null,
+            null,
+            null
+        ))
+    );
+    Assert.assertEquals(
+        401,
+        DruidMetrics.computeStatusCode(new QueryException(
+            null,
+            QueryException.UNAUTHORIZED_ERROR_CODE,
+            null,
+            null,
+            null
+        ))
+    );
+    Assert.assertEquals(
+        400,
+        DruidMetrics.computeStatusCode(new QueryException(
+            null,
+            QueryException.BAD_QUERY_CONTEXT_ERROR_CODE,
+            null,
+            null,
+            null
+        ))
+    );
+    Assert.assertEquals(
+        501,
+        DruidMetrics.computeStatusCode(new QueryException(
+            null,
+            QueryException.QUERY_UNSUPPORTED_ERROR_CODE,
+            null,
+            null,
+            null
+        ))
+    );
+  }
 }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to