[jira] Created: (HADOOP-442) slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster

Yoram Arnon (JIRA) Thu, 10 Aug 2006 00:46:01 -0700

slaves file should include an 'exclude' section, to prevent "bad" datanodes and 
tasktrackers from disrupting  a cluster
-----------------------------------------------------------------------------------------------------------------------


                 Key: HADOOP-442
                 URL: http://issues.apache.org/jira/browse/HADOOP-442
             Project: Hadoop
          Issue Type: Bug
            Reporter: Yoram Arnon


I recently had a few nodes go bad, such that they were inaccessible to ssh, but 
were still running their java processes.
tasks that executed on them were failing, causing jobs to fail.
I couldn't stop the java processes, because of the ssh issue, so I was helpless 
until I could actually power down these nodes.
restarting the cluster doesn't help, even when removing the bad nodes from the 
slaves file - they just reconnect and are accepted.
while we plan to avoid tasks from launching on the same nodes over and over, 
what I'd like is to be able to prevent rogue processes from connecting to the 
masters.
Ideally, the slaves file will contain an 'exclude' section, which will list 
nodes that shouldn't be accessed, and should be ignored if they try to connect. 
That would also help in configuring the slaves file for a large cluster - I'd 
list the full range of machines in the cluster, then list the ones that are 
down in the 'exclude' section

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (HADOOP-442) slaves file should include an 'exclude' section, to prevent "bad" datanodes and tasktrackers from disrupting a cluster

Reply via email to