[ 
https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721317#action_12721317
 ] 

Allen Wittenauer commented on HADOOP-5478:
------------------------------------------

bq. But the fact that we are reporting timestamps of the last health status 
gives the administrators an opportunity to know that something is amiss on this 
node, because it's health has not been updated for a while.

Hmm.  What interface do admins have that make this obvious?  If a cluster has 
2500 TTs, it isn't going to be obvious in a web UI that any given TT is sick.  

> Provide a node health check script and run it periodically to check the node 
> health status
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5478
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Aroop Maliakkal
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: active.png, blacklist1.png, cluster_setup.pdf, 
> hadoop-5478-1.patch, hadoop-5478-2.patch, hadoop-5478-3.patch, 
> hadoop-5478-4.patch, hadoop-5478-5.patch, hadoop-5478-6.patch
>
>
> Hadoop must have some mechanism to find the health status of a node . It 
> should run the health check script periodically and if there is any errors, 
> it should black list the node. This will be really helpful when we run static 
> mapred clusters. Else we may have to run some scripts/daemons periodically to 
> find the node status and take it offline manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to