[ https://issues.apache.org/jira/browse/HDDS-1811?focusedWorklogId=278765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278765 ]
ASF GitHub Bot logged work on HDDS-1811: ---------------------------------------- Author: ASF GitHub Bot Created on: 18/Jul/19 06:40 Start Date: 18/Jul/19 06:40 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1118: HDDS-1811. Prometheus metrics are broken URL: https://github.com/apache/hadoop/pull/1118 ## What changes were proposed in this pull request? Fix invalid metric type errors: ``` target=http://192.168.69.76:9882/prom err="invalid metric type \"apache.hadoop.ozone.container.common.transport.server.ratis._csm_metrics_delete_container_avg_time gauge\"" ``` and ``` target=http://scm:9876/prom err="invalid metric type \"_rati_s-_thre_e-d7116831-ac55-4bf2-a259-d85cfba0572d counter\"" ``` 1. datanode: avoid `.` in record name by using simple class name 2. SCM: replace `-` with `_`. Also properly convert `ALL_CAPS` names, eg. `RATIS_THREE` to `ratis_three` instead of `_rati_s-_thre_e`. https://issues.apache.org/jira/browse/HDDS-1811 ## How was this patch tested? Updated unit test. Checked metrics in `ozoneperf` pseudo-cluster. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 278765) Time Spent: 10m Remaining Estimate: 0h > Prometheus metrics are broken for datanodes due to an invalid metric > -------------------------------------------------------------------- > > Key: HDDS-1811 > URL: https://issues.apache.org/jira/browse/HDDS-1811 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Reporter: Elek, Marton > Assignee: Doroszlai, Attila > Priority: Blocker > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Datanodes can't be monitored with prometheus any more: > {code} > level=warn ts=2019-07-16T16:29:55.876Z caller=scrape.go:937 component="scrape > manager" scrape_pool=pods target=http://192.168.69.76:9882/prom msg="append > failed" err="invalid metric type > \"apache.hadoop.ozone.container.common.transport.server.ratis._csm_metrics_delete_container_avg_time > gauge\"" > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org