[
https://issues.apache.org/jira/browse/HDFS-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xing Lin reassigned HDFS-17055:
-------------------------------
Assignee: Xing Lin
> Export HAState as a metric from Namenode for monitoring
> -------------------------------------------------------
>
> Key: HDFS-17055
> URL: https://issues.apache.org/jira/browse/HDFS-17055
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 3.4.0, 3.3.9
> Reporter: Xing Lin
> Assignee: Xing Lin
> Priority: Minor
>
> We'd like measure the uptime for Namenodes: percentage of time when we have
> the active/standby/observer node available (up and running). We could monitor
> the namenode from an external service, such as ZKFC. But that would require
> the external service to be available 100% itself. And when this third-party
> external monitoring service is down, we won't have info on whether our
> Namenodes are still up.
> We propose to take a different approach: we will emit Namenode state directly
> from namenode itself. Whenever we miss a data point for this metric, we
> consider the corresponding namenode to be down/not available. In other words,
> we assume the metric collection/monitoring infrastructure to be 100% reliable.
> One implementation detail: in hadoop, we have the _NameNodeMetrics_ class,
> which is used to emit all metrics for {_}NameNode.java{_}. However, we don't
> think that is a good place to emit NameNode HAState. HAState is stored in
> NameNode.java and we should directly emit it from NameNode.java. Otherwise,
> we basically duplicate this info in two classes and we would have to keep
> them in sync. Besides, _NameNodeMetrics_ class does not have a reference to
> the _NameNode_ object which it belongs to. An _NameNodeMetrics_ is created by
> a _static_ function _initMetrics()_ in {_}NameNode.java{_}. We shouldn't emit
> HA state from FSNameSystem.java either, as it is initialized from
> NameNode.java and all state transitions are implemented in NameNode.java.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]