[jira] [Commented] (HADOOP-10251) Both NameNodes could be in STANDBY State if SNN network is unstable

Hudson (JIRA) Thu, 24 Apr 2014 06:33:18 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979710#comment-13979710
 ]


Hudson commented on HADOOP-10251:
---------------------------------

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1767 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1767/])
HADOOP-10251. Both NameNodes could be in STANDBY State if SNN network is 
unstable. Contributed by Vinayakumar B. (umamahesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589494)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java


> Both NameNodes could be in STANDBY State if SNN network is unstable
> -------------------------------------------------------------------
>
>                 Key: HADOOP-10251
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10251
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.2.0
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
>             Fix For: 3.0.0, 2.5.0
>
>         Attachments: HADOOP-10251.patch, HADOOP-10251.patch, 
> HADOOP-10251.patch, HADOOP-10251.patch, HADOOP-10251.patch
>
>
> Following corner scenario happened in one of our cluster.
> 1. NN1 was Active and NN2 was Standby
> 2. NN2 machine's network was slow 
> 3. NN1 got shutdown.
> 4. NN2 ZKFC got the notification and trying to check for old active for 
> fencing. (This took little more time, again due to slow network)
> 5. In between, NN1 got restarted by our automatic monitoring, and ZKFC made 
> it Active.
> 6. Now NN2 ZKFC got Old Active as NN1 and it did graceful fencing of NN1 to 
> STANBY.
> 7. Before writing ActiveBreadCrumb to ZK, NN2 ZKFC got session timeout and 
> got shutdown before making NN2 Active.
> *Now cluster having both NameNodes as STANDBY.*
> NN1 ZKFC still thinks that its nameNode is in Active state. 
> NN2 ZKFC waiting for election.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10251) Both NameNodes could be in STANDBY State if SNN network is unstable

Reply via email to