HOD gracefully exclude "bad" nodes during ring formation
--------------------------------------------------------
Key: HADOOP-3184
URL: https://issues.apache.org/jira/browse/HADOOP-3184
Project: Hadoop Core
Issue Type: Improvement
Components: contrib/hod
Reporter: Marco Nicosia
HOD clusters sometimes fail to allocate due to a single "bad" node. During ring
formation, the entire ring should not be dependent upon every single node being
good. Instead, it should either exclude any ring member that does not
adequately join the ring in a specified amount of time.
This is a frequent HOD user issue (although not directly caused by HOD).
Examples of bad nodes: Missing java, incorrect version of HOD or Hadoop, local
name-cache corrupt, slow network links, drives just beginning to fail, etc.
Many of these conditions are known, and we can monitor for those separately,
but this enhancement would shield users from unknown failure conditions that we
haven't yet anticipated. This way, a user will get a cluster, instead of
hanging indefinitely.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.