[
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDFS-17055:
----------------------------------
Labels: pull-request-available (was: )
> Export HAState as a metric from Namenode for monitoring
> -------------------------------------------------------
>
> Key: HDFS-17055
> URL: https://issues.apache.org/jira/browse/HDFS-17055
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 3.4.0, 3.3.9
> Reporter: Xing Lin
> Assignee: Xing Lin
> Priority: Minor
> Labels: pull-request-available
>
> We'd like measure the uptime for Namenodes: percentage of time when we have
> the active/standby/observer node available (up and running). We could monitor
> the namenode from an external service, such as ZKFC. But that would require
> the external service to be available 100% itself. And when this third-party
> external monitoring service is down, we won't have info on whether our
> Namenodes are still up.
> We propose to take a different approach: we will emit Namenode state directly
> from namenode itself. Whenever we miss a data point for this metric, we
> consider the corresponding namenode to be down/not available. In other words,
> we assume the metric collection/monitoring infrastructure to be 100% reliable.
> One implementation detail: in hadoop, we have the _NameNodeMetrics_ class,
> which is currently used to emit all metrics for {_}NameNode.java{_}. However,
> we don't think that is a good place to emit NameNode HAState. HAState is
> stored in NameNode.java and we should directly emit it from NameNode.java.
> Otherwise, we basically duplicate this info in two classes and we would have
> to keep them in sync. Besides, _NameNodeMetrics_ class does not have a
> reference to the _NameNode_ object which it belongs to. An _NameNodeMetrics_
> is created by a _static_ function _initMetrics()_ in {_}NameNode.java{_}.
> We shouldn't emit HA state from FSNameSystem.java either, as it is
> initialized from NameNode.java and all state transitions are implemented in
> NameNode.java.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]