[jira] Commented: (HADOOP-5478) Provide a node health check script and run it periodically to check the node health status

Hemanth Yamijala (JIRA) Tue, 16 Jun 2009 21:24:32 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720491#action_12720491
 ]


Hemanth Yamijala commented on HADOOP-5478:
------------------------------------------

bq. if it fails to receive response from TT, wait for X seconds, do an extra 
kill (to ensure TT is dead), and quit itself.

Hong, I am not certain about this. I am essentially viewing the TT as the 
master still, and the health monitor is just a helper service that monitors the 
health of the node, not the health of the TT itself. It seems wrong that this 
service could kill the master. I can conceive in future that we extend this to 
monitor the health of the TT also. And take corrective actions in case 
something is wrong with the TT. But I think that should be the topic of a 
different JIRA, or at a minimum an extension to this one. I would still like 
the scope of this to be restricted to providing a plug-in for checking the 
health of a node.





> Provide a node health check script and run it periodically to check the node 
> health status
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5478
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5478
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Aroop Maliakkal
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: hadoop-5478-1.patch, hadoop-5478-2.patch, 
> hadoop-5478-3.patch, hadoop-5478-4.patch, hadoop-5478-5.patch
>
>
> Hadoop must have some mechanism to find the health status of a node . It 
> should run the health check script periodically and if there is any errors, 
> it should black list the node. This will be really helpful when we run static 
> mapred clusters. Else we may have to run some scripts/daemons periodically to 
> find the node status and take it offline manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5478) Provide a node health check script and run it periodically to check the node health status

Reply via email to