This is an automated email from the ASF dual-hosted git repository.
tuglu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new a9106f165e5 Handle QueryException edge case in status code metric
emission (#18633)
a9106f165e5 is described below
commit a9106f165e51d8805247011b93f3fba4ae9501bf
Author: jtuglu1 <[email protected]>
AuthorDate: Tue Oct 14 23:40:49 2025 -0700
Handle QueryException edge case in status code metric emission (#18633)
Builds on
https://github.com/apache/druid/commit/4b624b2cb47071d20dc4d0fd03bc027551f56b92
to cover edge cases in Broker processing where no `DruidException` wrapper is
used (and `QueryException` is thrown directly). This allows proper error code
classification for this set of cases rather than the default 500. There are
currently no other custom Druid exception types that map cleanly to status
codes, so handling these 2 classes should cover all cases.
---
docs/operations/metrics.md | 12 ++--
.../java/org/apache/druid/query/DruidMetrics.java | 6 +-
.../org/apache/druid/query/DruidMetricsTest.java | 65 ++++++++++++++++++++++
3 files changed, 75 insertions(+), 8 deletions(-)
diff --git a/docs/operations/metrics.md b/docs/operations/metrics.md
index f5fdcecccd3..ce488676a0f 100644
--- a/docs/operations/metrics.md
+++ b/docs/operations/metrics.md
@@ -45,13 +45,13 @@ Most metric values reset each emission period, as specified
in `druid.monitoring
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|Native Query:
`dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`,
`remoteAddress`, `id`, `code`.|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|Native Query:
`dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`,
`remoteAddress`, `id`, `statusCode`.|< 1s|
### Broker
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`,
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`,
`id`.</p><p>Aggregation Queries: `numMetrics`,
`numComplexMetrics`.</p><p>GroupBy: `numDimensions`.</p><p> TopN: `threshold`,
`dimension`.</p>|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`,
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`,
`statusCode`.</p><p>Aggregation Queries: `numMetrics`,
`numComplexMetrics`.</p><p>GroupBy: `numDimensions`.</p><p> TopN: `threshold`,
`dimension`.</p>|< 1s|
|`query/bytes`|The total number of bytes returned to the requesting client in
the query response from the broker. Other services report the total bytes for
their portion of the query. |<p>Common: `dataSource`, `type`, `interval`,
`hasFilters`, `duration`, `context`, `remoteAddress`, `id`.</p><p> Aggregation
Queries: `numMetrics`, `numComplexMetrics`.</p><p> GroupBy:
`numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>| |
|`query/node/time`|Milliseconds taken to query individual historical/realtime
processes.|`id`, `status`, `server`|< 1s|
|`query/resultCache/hit`|Whether the query hit the result cache (1) or not
(0). Emission of the metric indicates the result-level cache was
polled.|<p>Common: `dataSource`, `type`, `interval`, `hasFilters`, `duration`,
`context`, `remoteAddress`, `id`.</p>|Varies|
@@ -64,7 +64,7 @@ Most metric values reset each emission period, as specified
in `druid.monitoring
|`query/timeout/count`|Number of timed out queries.|This metric is only
available if the `QueryCountStatsMonitor` module is included.| |
|`query/segments/count`|This metric is not enabled by default. See the
`QueryMetrics` Interface for reference regarding enabling this metric. Number
of segments that will be touched by the query. In the broker, it makes a plan
to distribute the query to realtime tasks and historicals based on a snapshot
of segment distribution state. If there are some segments moved after this
snapshot is created, certain historicals and realtime tasks can report those
segments as missing to the broker. [...]
|`query/priority`|Assigned lane and priority, only if Laning strategy is
enabled. Refer to [Laning
strategies](../configuration/index.md#laning-strategies)|`lane`, `dataSource`,
`type`|0|
-|`sqlQuery/time`|Milliseconds taken to complete a SQL query.|`id`,
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`, `code`|<
1s|
+|`sqlQuery/time`|Milliseconds taken to complete a SQL query.|`id`,
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`,
`statusCode`|< 1s|
|`sqlQuery/planningTimeMs`|Milliseconds taken to plan a SQL to native
query.|`id`, `nativeQueryIds`, `dataSource`, `remoteAddress`, `success`,
`engine`| |
|`sqlQuery/bytes`|Number of bytes returned in the SQL query response.|`id`,
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`| |
|`serverview/init/time`|Time taken to initialize the broker server view.
Useful to detect if brokers are taking too long to start.||Depends on the
number of segments.|
@@ -97,7 +97,7 @@ Most metric values reset each emission period, as specified
in `druid.monitoring
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`,
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`,
`id`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p>
GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`,
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`,
`statusCode`.</p><p> Aggregation Queries: `numMetrics`,
`numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`,
`dimension`.</p>|< 1s|
|`query/segment/time`|Milliseconds taken to query individual segment. Includes
time to page in the segment from disk.|`id`, `status`, `segment`,
`vectorized`.|several hundred milliseconds|
|`query/wait/time`|Milliseconds spent waiting for a segment to be
scanned.|`id`, `segment`|< several hundred milliseconds|
|`segment/scan/pending`|Number of segments in queue waiting to be
scanned.||Close to 0|
@@ -121,7 +121,7 @@ Most metric values reset each emission period, as specified
in `druid.monitoring
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
-|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`,
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`,
`id`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p>
GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|< 1s|
+|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`,
`type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`,
`statusCode`.</p><p> Aggregation Queries: `numMetrics`,
`numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`,
`dimension`.</p>|< 1s|
|`query/wait/time`|Milliseconds spent waiting for a segment to be
scanned.|`id`, `segment`|several hundred milliseconds|
|`segment/scan/pending`|Number of segments in queue waiting to be
scanned.||Close to 0|
|`segment/scan/active`|Number of segments currently scanned. This metric also
indicates how many threads from `druid.processing.numThreads` are currently
being used.||Close to `druid.processing.numThreads`|
@@ -186,7 +186,7 @@ If SQL is enabled, the Broker will emit the following
metrics for SQL.
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
-|`sqlQuery/time`|Milliseconds taken to complete a SQL.|`id`, `nativeQueryIds`,
`dataSource`, `remoteAddress`, `success`, `engine`|< 1s|
+|`sqlQuery/time`|Milliseconds taken to complete a SQL.|`id`, `nativeQueryIds`,
`dataSource`, `remoteAddress`, `success`, `engine`, `statusCode`|< 1s|
|`sqlQuery/planningTimeMs`|Milliseconds taken to plan a SQL to native
query.|`id`, `nativeQueryIds`, `dataSource`, `remoteAddress`, `success`,
`engine`| |
|`sqlQuery/bytes`|number of bytes returned in SQL response.|`id`,
`nativeQueryIds`, `dataSource`, `remoteAddress`, `success`, `engine`| |
diff --git a/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
b/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
index 61c4733a64e..08a8a6e9396 100644
--- a/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
+++ b/processing/src/main/java/org/apache/druid/query/DruidMetrics.java
@@ -37,7 +37,7 @@ public class DruidMetrics
public static final String INTERVAL = "interval";
public static final String ID = "id";
public static final String SUBQUERY_ID = "subQueryId";
- public static final String CODE = "code";
+ public static final String CODE = "statusCode";
public static final String STATUS = "status";
public static final String ENGINE = "engine";
public static final String DURATION = "duration";
@@ -95,7 +95,7 @@ public class DruidMetrics
* Computes the HTTP status code based on the query error (if any) for
tagged metric emission.
* <ul>
* <li>If error is null: returns 200 (success)</li>
- * <li>If error is a DruidException: returns the category's expected HTTP
status</li>
+ * <li>If error is a {@link DruidException} or {@link QueryException}:
returns the corresponding status code</li>
* <li>Otherwise (unclassified error): returns 500 (internal server
error)</li>
* </ul>
*
@@ -109,6 +109,8 @@ public class DruidMetrics
}
if (error instanceof DruidException) {
return ((DruidException) error).getCategory().getExpectedStatus();
+ } else if (error instanceof QueryException) {
+ return ((QueryException) error).getFailType().getExpectedStatus();
}
// Unclassified errors default to 500 (defensive)
return DruidException.Category.DEFENSIVE.getExpectedStatus();
diff --git
a/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
b/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
index 0ee2ec9beb6..5ab96f86351 100644
--- a/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
+++ b/processing/src/test/java/org/apache/druid/query/DruidMetricsTest.java
@@ -44,5 +44,70 @@ public class DruidMetricsTest
);
}
}
+
+ @Test
+ public void testComputeStatusCode_queryExceptionCategories()
+ {
+ Assert.assertEquals(
+ 500,
+ DruidMetrics.computeStatusCode(new QueryException(
+ null,
+ QueryException.QUERY_CANCELED_ERROR_CODE,
+ null,
+ null,
+ null
+ ))
+ );
+ Assert.assertEquals(
+ 504,
+ DruidMetrics.computeStatusCode(new QueryException(
+ null,
+ QueryException.QUERY_TIMEOUT_ERROR_CODE,
+ null,
+ null,
+ null
+ ))
+ );
+ Assert.assertEquals(
+ 429,
+ DruidMetrics.computeStatusCode(new QueryException(
+ null,
+ QueryException.QUERY_CAPACITY_EXCEEDED_ERROR_CODE,
+ null,
+ null,
+ null
+ ))
+ );
+ Assert.assertEquals(
+ 401,
+ DruidMetrics.computeStatusCode(new QueryException(
+ null,
+ QueryException.UNAUTHORIZED_ERROR_CODE,
+ null,
+ null,
+ null
+ ))
+ );
+ Assert.assertEquals(
+ 400,
+ DruidMetrics.computeStatusCode(new QueryException(
+ null,
+ QueryException.BAD_QUERY_CONTEXT_ERROR_CODE,
+ null,
+ null,
+ null
+ ))
+ );
+ Assert.assertEquals(
+ 501,
+ DruidMetrics.computeStatusCode(new QueryException(
+ null,
+ QueryException.QUERY_UNSUPPORTED_ERROR_CODE,
+ null,
+ null,
+ null
+ ))
+ );
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]