GWphua opened a new issue, #18781: URL: https://github.com/apache/druid/issues/18781
Fixes #17944 Motivation Currently, we have GroupByStatsMonitor, which provide metrics for GroupBy queries by: Aggregating results of all GroupBy metrics in a certain time interval. Emit them regularly through ServiceEmitter. There are still some limitations of the GroupByStatsMonitor: 1. Which query (SQL / Native) is emitting the stats? 2. Which datasource is being queried? These limitations lead to more and more metrics being introduced, which does not really answer the questions asked above. (See #17945, #18731) After migrating to GroupByQueryMetrics, we can use the dimensions (e.g. ID, DATASOURCE, and more) to help us answer the above question. Proposed changes - Replace the PerQueryStats accumulator and GroupByStatsProvider’s resource‑ID map with enhanced GroupByQueryMetrics instances. These metrics now track merge‑buffer acquisition time, spill bytes, dictionary size, and can be reported per query using existing dimensions (id, datasource, sqlQueryId, etc.). - Collate the GroupByQueryMetrics in the QueryRunner level (Maybe wrap `GroupByMergingQueryRunner` in a `MetricsEmittingQueryRunner`). - GroupByStatsProvider no longer exposes PerQueryStats. Internal extensions/tests that constructed those objects need to switch to `GroupByQueryMetrics` or `QueryPlus<ResultRow>` objects, but this does not affect public interfaces. Rationale - Using `GroupByQueryMetrics` leverages an existing, dimension‑rich metrics pipeline instead of growing the ad‑hoc `GroupByStatsMonitor`. - Providing these metrics by default, operators will not need to dive into documentation to retrieve the qualified class name of `GroupByStatsMonitor` to append to `druid.monitoring.monitors`. - Encourage future metrics to be added to `GroupByQueryMetrics` in favour of `GroupByStatsMonitor`, following after the style of the entire repository. Operational impact - No user‑visible API changes: SQL/native queries, JSON specs, and metric names remain the same. Existing dashboards based on GroupByStatsMonitor continue working. - Cluster operators gain richer per‑query telemetry automatically; no configuration changes are required beyond upgrading. Test plan (optional) - Unit tests ensuring GroupByStatsProvider aggregates metrics correctly from GroupByQueryMetrics. - Integration tests for query metrics. Future work (optional) - Evaluate whether GroupByStatsMonitor is still needed once operators rely on `GroupByQueryMetrics`. We may need to find a new home for metrics that show the number of used / pending merge buffers in the pool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
