[
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell resolved HDFS-17237.
--------------------------------------
Resolution: Fixed
> Remove IPCLoggerChannel Metrics when the logger is closed
> ---------------------------------------------------------
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> When an IPCLoggerChannel is created (which is used to read from and write to
> the Journal nodes) it also creates a metrics object. When the namenodes
> failover, the IPC loggers are all closed and reopened in read mode on the new
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The
> closing frees the resources and discards the original IPCLoggerChannel object
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same
> hostname, but a different IP, when the failover happens, you end up with 4
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated,
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original
> issue on a docker cluster and validated it is resolved with this change in
> place.
> For info, the logger metrics look like:
> {code}
> {
> "name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
> }
> {code}
> Node the name includes the IP, rather than the hostname.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]