[
https://issues.apache.org/jira/browse/KUDU-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840542#comment-17840542
]
ASF subversion and git services commented on KUDU-3566:
-------------------------------------------------------
Commit b236d534abeb60520e4568bb4a1452d6674bb597 in kudu's branch
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b236d534a ]
KUDU-3566 fix summary metrics in Prometheus format
This patch corrects the output of various Kudu metrics backed by HDR
histograms. From the Prometheus perspective, those metrics are output
as summaries [1], not histograms [2]. It's necessary to mark them
accordingly to avoid misinterpretation of the collected statistics.
I updated corresponding unit tests and verified that the updated output
was properly parsed and interpreted by a Prometheus 2.50.0 instance
running on my macOS laptop.
[1] https://prometheus.io/docs/concepts/metric_types/#summary
[2] https://prometheus.io/docs/concepts/metric_types/#histogram
Change-Id: I1375ddf1b0ecd730327cd44b4955813b80107f7b
Reviewed-on: http://gerrit.cloudera.org:8080/21338
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>
> Incorrect semantics for Prometheus-style histogram metrics
> ----------------------------------------------------------
>
> Key: KUDU-3566
> URL: https://issues.apache.org/jira/browse/KUDU-3566
> Project: Kudu
> Issue Type: Bug
> Components: master, tserver
> Affects Versions: 1.17.0
> Reporter: Alexey Serbin
> Priority: Major
> Labels: metrics, observability
>
> Original KUDU-3375 implementation incorrectly exposes [summary-type
> Prometheus metrics|https://prometheus.io/docs/concepts/metric_types/#summary]
> as [histogram-type
> ones|https://prometheus.io/docs/concepts/metric_types/#histogram] for data
> collected by corresponding HDR histograms. For example, below are snippets
> from {{/metric}} and {{/metrics_prometheus}} for statistics on ListMasters
> RPC.
> The data exposed as Prometheus-style histogram metrics should have been
> reported as summary metrics instead.
> JSON-style:
> {noformat}
> {
> "name": "handler_latency_kudu_master_MasterService_ListMasters",
> "total_count": 26,
> "min": 152,
> "mean": 301.2692307692308,
> "percentile_75": 324,
> "percentile_95": 468,
> "percentile_99": 844,
> "percentile_99_9": 844,
> "percentile_99_99": 844,
> "max": 844,
> "total_sum": 7833
> }
> {noformat}
> Prometheus-style counterpart:
> {noformat}
> # HELP kudu_master_handler_latency_kudu_master_MasterService_ListMasters
> Microseconds spent handling kudu.master.MasterService.ListMasters RPC requests
> # TYPE kudu_master_handler_latency_kudu_master_MasterService_ListMasters
> histogram
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
> le="0.75"} 324
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
> le="0.95"} 468
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
> le="0.99"} 844
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
> le="0.999"} 844
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
> le="0.9999"} 844
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_bucket{unit_type="microseconds",
> le="+Inf"} 26
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_sum{unit_type="microseconds"}
> 7833
> kudu_master_handler_latency_kudu_master_MasterService_ListMasters_count{unit_type="microseconds"}
> 26
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)