[ 
https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709290#action_12709290
 ] 

eric baldeschwieler commented on HADOOP-5478:
---------------------------------------------

looks good.

I think we should track some stats per node on the JT.  Just the total success 
& failures reported over the current and last hour and day long windows.  
Showing this and current health and the error line (as allen suggests) on the 
JT console will let an operator quickly determine if any nodes are ill.

Do we currently track such success/failure ratios for tasks on a node?  That 
would also be great to display on the same console page.

> Provide a node health check script and run it periodically to check the node 
> health status
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5478
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Aroop Maliakkal
>            Assignee: Vinod K V
>
> Hadoop must have some mechanism to find the health status of a node . It 
> should run the health check script periodically and if there is any errors, 
> it should black list the node. This will be really helpful when we run static 
> mapred clusters. Else we may have to run some scripts/daemons periodically to 
> find the node status and take it offline manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to