[ 
https://issues.apache.org/jira/browse/HADOOP-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620696#action_12620696
 ] 

Chris Douglas commented on HADOOP-3767:
---------------------------------------

bq. should this liveness test include a min #of live datanodes? Like 1?

That seems to be verifying a different property than an internal health check. 
The number of live datanodes is also visible through the web interface, at 
least. The number of datanodes could be added as a (usually not very 
interesting) metric, but it would probably fit better in an SNMP (or similar) 
layer.

On failed pings: should a server failing a health check change its status, or 
would that just invite race conditions?

> Brief, baseline namenode health check
> -------------------------------------
>
>                 Key: HADOOP-3767
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3767
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Chris Douglas
>            Priority: Minor
>         Attachments: 3767-0.patch, 3767-1.patch
>
>
> It would be helpful if there were a way to query the namenode to verify that 
> it is basically healthy. In particular, that all the expected threads are 
> running, data structures appear sane, etc. Administrators could use this 
> interface to verify that the namenode is both up and essentially functional, 
> attaching cron jobs, notification, etc. as required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to