[GitHub] [hadoop] Kimahriman opened a new pull request #3369: HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values

GitBox Wed, 01 Sep 2021 07:04:31 -0700


Kimahriman opened a new pull request #3369:
URL: https://github.com/apache/hadoop/pull/3369



   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   Fixes a bug with the Prometheus metrics sink where metrics were deduped on 
their name alone, and didn't include the tag values for deduplication purposes. 
Prometheus metrics are uniquely identified by their name and labels, so several 
metrics were just getting dropped. Specifically things like RPC metrics were 
only including one of the servers/ports per metric type, and Yarn queue metrics 
only included metrics for one queue. 
   
   Additionally, because of the "push" nature of Hadoop metrics, this would end 
up creating a lot of extra metrics for things where the tags can change over 
time but they still actually mean the same thing. For example, the `hastate` of 
namenode metrics can change, but you really only want the most recent one. To 
address this, I changed it to only expose metrics after a `flush` call, and to 
start fresh after each `flush` call. This prevents old metrics from hanging 
around and constantly being exposed until the service is restarted.
   
   There are still some "bad" tags that are exposed which can lead to multiple 
Prometheus series being created when really they are the same thing. However, 
these can be dealt with on the Prometheus side, ignoring certain labels, rather 
than trying to hard code all the bad tags on the Hadoop side.
   
   I don't _think_ there should be any threading/race conditions with 
publishing metrics, since the publish metrics methods are synchronized.
   
   Also adds the help line to the output.
   
   ### How was this patch tested?
   New unit tests.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] ~~Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?~~
   - [ ] ~~If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?~~
   - [ ] ~~If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?~~
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] Kimahriman opened a new pull request #3369: HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values

Reply via email to