System test framework needs to black list unresponsive cluster nodes after a 
timeout 
-------------------------------------------------------------------------------------

                 Key: HADOOP-6979
                 URL: https://issues.apache.org/jira/browse/HADOOP-6979
             Project: Hadoop Common
          Issue Type: Improvement
          Components: test
    Affects Versions: 0.22.0
            Reporter: Konstantin Boudnik


Sometimes one or more nodes in a cluster deployed for system testing purposes 
might become unresponsive (hw failure, Hadoop daemon crashes, etc.). In the 
current implementation, Herriot will be trying to connect to such a node(s) 
forever or until a timeout will occur. Instead, an unresponsive node should be 
places into a blacklist and the framework has to move on.

A cluster should be declared unusable if NN or JT are placed on the blacklist, 
or if a certain percentage of DNs (TTs) were blacklisted. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to