System test framework needs to black list unresponsive cluster nodes after a
timeout
-------------------------------------------------------------------------------------
Key: HADOOP-6979
URL: https://issues.apache.org/jira/browse/HADOOP-6979
Project: Hadoop Common
Issue Type: Improvement
Components: test
Affects Versions: 0.22.0
Reporter: Konstantin Boudnik
Sometimes one or more nodes in a cluster deployed for system testing purposes
might become unresponsive (hw failure, Hadoop daemon crashes, etc.). In the
current implementation, Herriot will be trying to connect to such a node(s)
forever or until a timeout will occur. Instead, an unresponsive node should be
places into a blacklist and the framework has to move on.
A cluster should be declared unusable if NN or JT are placed on the blacklist,
or if a certain percentage of DNs (TTs) were blacklisted.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.