[ https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718819#action_12718819 ]
Steve Loughran commented on HADOOP-5478: ---------------------------------------- I do a fair amount of health monitoring , and there is a lot to be said for something that runs is a separate process from any of the Hadoop services to do the checking. # it could be its own service, fairly lightweight # gives you the option of monitoring (and if need be killing) the TT process itself. # stops you accidentally stamping on bits of the JVM. As an example, I'd deployed something that checked the health of bits of HDFS by checking that files where there, but that code closed the handle after use, killing any TT in the same process. > Provide a node health check script and run it periodically to check the node > health status > ------------------------------------------------------------------------------------------ > > Key: HADOOP-5478 > URL: https://issues.apache.org/jira/browse/HADOOP-5478 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.20.0 > Reporter: Aroop Maliakkal > Assignee: Vinod K V > Attachments: hadoop-5478-1.patch, hadoop-5478-2.patch > > > Hadoop must have some mechanism to find the health status of a node . It > should run the health check script periodically and if there is any errors, > it should black list the node. This will be really helpful when we run static > mapred clusters. Else we may have to run some scripts/daemons periodically to > find the node status and take it offline manually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.