GWphua opened a new issue, #18781:
URL: https://github.com/apache/druid/issues/18781

   Fixes #17944
   
   Motivation
   Currently, we have GroupByStatsMonitor, which provide metrics for GroupBy 
queries by:
   
   Aggregating results of all GroupBy metrics in a certain time interval.
   Emit them regularly through ServiceEmitter.
   There are still some limitations of the GroupByStatsMonitor:
   
   1. Which query (SQL / Native) is emitting the stats?
   2. Which datasource is being queried?
   
   These limitations lead to more and more metrics being introduced, which does 
not really answer the questions asked above. (See #17945, #18731)
   
   After migrating to GroupByQueryMetrics, we can use the dimensions (e.g. ID, 
DATASOURCE, and more) to help us answer the above question.
   
   Proposed changes
   
   - Replace the PerQueryStats accumulator and GroupByStatsProvider’s 
resource‑ID map with enhanced GroupByQueryMetrics instances. These metrics now 
track merge‑buffer acquisition time, spill bytes, dictionary size, and can be 
reported per query using existing dimensions (id, datasource, sqlQueryId, etc.).
   - Collate the GroupByQueryMetrics in the QueryRunner level (Maybe wrap 
`GroupByMergingQueryRunner` in a `MetricsEmittingQueryRunner`).
   - GroupByStatsProvider no longer exposes PerQueryStats. Internal 
extensions/tests that constructed those objects need to switch to 
`GroupByQueryMetrics` or `QueryPlus<ResultRow>` objects, but this does not 
affect public interfaces. 
   
   Rationale
   - Using `GroupByQueryMetrics` leverages an existing, dimension‑rich metrics 
pipeline instead of growing the ad‑hoc `GroupByStatsMonitor`. 
   - Providing these metrics by default, operators will not need to dive into 
documentation to retrieve the qualified class name of `GroupByStatsMonitor` to 
append to `druid.monitoring.monitors`.
   - Encourage future metrics to be added to `GroupByQueryMetrics` in favour of 
`GroupByStatsMonitor`, following after the style of the entire repository.
   
   Operational impact
   - No user‑visible API changes: SQL/native queries, JSON specs, and metric 
names remain the same. Existing dashboards based on GroupByStatsMonitor 
continue working.
   - Cluster operators gain richer per‑query telemetry automatically; no 
configuration changes are required beyond upgrading.
   
   Test plan (optional)
   - Unit tests ensuring GroupByStatsProvider aggregates metrics correctly from 
GroupByQueryMetrics.
   - Integration tests for query metrics.
   
   Future work (optional)
   - Evaluate whether GroupByStatsMonitor is still needed once operators rely 
on `GroupByQueryMetrics`. We may need to find a new home for metrics that show 
the number of used / pending merge buffers in the pool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to