[
https://issues.apache.org/jira/browse/HADOOP-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620696#action_12620696
]
Chris Douglas commented on HADOOP-3767:
---------------------------------------
bq. should this liveness test include a min #of live datanodes? Like 1?
That seems to be verifying a different property than an internal health check.
The number of live datanodes is also visible through the web interface, at
least. The number of datanodes could be added as a (usually not very
interesting) metric, but it would probably fit better in an SNMP (or similar)
layer.
On failed pings: should a server failing a health check change its status, or
would that just invite race conditions?
> Brief, baseline namenode health check
> -------------------------------------
>
> Key: HADOOP-3767
> URL: https://issues.apache.org/jira/browse/HADOOP-3767
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: Chris Douglas
> Priority: Minor
> Attachments: 3767-0.patch, 3767-1.patch
>
>
> It would be helpful if there were a way to query the namenode to verify that
> it is basically healthy. In particular, that all the expected threads are
> running, data structures appear sane, etc. Administrators could use this
> interface to verify that the namenode is both up and essentially functional,
> attaching cron jobs, notification, etc. as required.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.