[ https://issues.apache.org/jira/browse/HADOOP-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978796#comment-13978796 ]
Hudson commented on HADOOP-10251: --------------------------------- SUCCESS: Integrated in Hadoop-trunk-Commit #5554 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5554/]) HADOOP-10251. Both NameNodes could be in STANDBY State if SNN network is unstable. Contributed by Vinayakumar B. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589494) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java > Both NameNodes could be in STANDBY State if SNN network is unstable > ------------------------------------------------------------------- > > Key: HADOOP-10251 > URL: https://issues.apache.org/jira/browse/HADOOP-10251 > Project: Hadoop Common > Issue Type: Bug > Components: ha > Affects Versions: 2.2.0 > Reporter: Vinayakumar B > Assignee: Vinayakumar B > Priority: Critical > Fix For: 3.0.0, 2.5.0 > > Attachments: HADOOP-10251.patch, HADOOP-10251.patch, > HADOOP-10251.patch, HADOOP-10251.patch, HADOOP-10251.patch > > > Following corner scenario happened in one of our cluster. > 1. NN1 was Active and NN2 was Standby > 2. NN2 machine's network was slow > 3. NN1 got shutdown. > 4. NN2 ZKFC got the notification and trying to check for old active for > fencing. (This took little more time, again due to slow network) > 5. In between, NN1 got restarted by our automatic monitoring, and ZKFC made > it Active. > 6. Now NN2 ZKFC got Old Active as NN1 and it did graceful fencing of NN1 to > STANBY. > 7. Before writing ActiveBreadCrumb to ZK, NN2 ZKFC got session timeout and > got shutdown before making NN2 Active. > *Now cluster having both NameNodes as STANDBY.* > NN1 ZKFC still thinks that its nameNode is in Active state. > NN2 ZKFC waiting for election. -- This message was sent by Atlassian JIRA (v6.2#6252)