[
https://issues.apache.org/jira/browse/HADOOP-17804?focusedWorklogId=645280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-645280
]
ASF GitHub Bot logged work on HADOOP-17804:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 01/Sep/21 14:04
Start Date: 01/Sep/21 14:04
Worklog Time Spent: 10m
Work Description: Kimahriman opened a new pull request #3369:
URL: https://github.com/apache/hadoop/pull/3369
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
Fixes a bug with the Prometheus metrics sink where metrics were deduped on
their name alone, and didn't include the tag values for deduplication purposes.
Prometheus metrics are uniquely identified by their name and labels, so several
metrics were just getting dropped. Specifically things like RPC metrics were
only including one of the servers/ports per metric type, and Yarn queue metrics
only included metrics for one queue.
Additionally, because of the "push" nature of Hadoop metrics, this would end
up creating a lot of extra metrics for things where the tags can change over
time but they still actually mean the same thing. For example, the `hastate` of
namenode metrics can change, but you really only want the most recent one. To
address this, I changed it to only expose metrics after a `flush` call, and to
start fresh after each `flush` call. This prevents old metrics from hanging
around and constantly being exposed until the service is restarted.
There are still some "bad" tags that are exposed which can lead to multiple
Prometheus series being created when really they are the same thing. However,
these can be dealt with on the Prometheus side, ignoring certain labels, rather
than trying to hard code all the bad tags on the Hadoop side.
I don't _think_ there should be any threading/race conditions with
publishing metrics, since the publish metrics methods are synchronized.
Also adds the help line to the output.
### How was this patch tested?
New unit tests.
### For code changes:
- [X] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] ~~Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?~~
- [ ] ~~If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?~~
- [ ] ~~If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?~~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 645280)
Remaining Estimate: 0h
Time Spent: 10m
> Prometheus metrics only include the last set of labels
> ------------------------------------------------------
>
> Key: HADOOP-17804
> URL: https://issues.apache.org/jira/browse/HADOOP-17804
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Affects Versions: 3.3.1
> Reporter: Adam Binford
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> A prometheus endpoint was added in
> https://issues.apache.org/jira/browse/HADOOP-16398, but the logic that puts
> them into a map based on the "key" incorrectly hides any metrics with the
> same key but different labels. The relevant code is here:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/sink/PrometheusMetricsSink.java#L55|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/sink/PrometheusMetricsSink.java#L55.]
> The labels/tags need to be taken into account, as different tags mean
> different metrics. For example, I came across this while trying to scrape
> metrics for all the queues in our scheduler. Only the last queue is included
> because all the metrics have the same "key" but a different "queue" label/tag.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]