[ 
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978295#comment-16978295
 ] 

Ayush Saxena commented on HDFS-14961:
-------------------------------------

Added a line for state in the log, just before the check you said, sharing part 
of log :

{noformat}
019-11-20 15:52:47,061 [IPC Server handler 2 on default port 10024-EventThread] 
INFO  ha.ZKFailoverController (ZKFailoverController.java:becomeStandby(491)) - 
ZK Election indicated that NameNode at localhost/127.0.0.1:10022 should become 
standby
2019-11-20 15:52:47,065 [IPC Server handler 2 on default port 10022] WARN  
namenode.NameNode (NameNode.java:checkHaStateChange(2106)) - Allowing manual HA 
control from 127.0.0.1 even though automatic HA is enabled, because the user 
specified the force flag
2019-11-20 15:52:47,066 [IPC Server handler 2 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping 
services started for standby state
2019-11-20 15:52:47,070 [Edit log tailer] WARN  ha.EditLogTailer 
(EditLogTailer.java:doWork(528)) - Edit log tailer interrupted: sleep 
interrupted
2019-11-20 15:52:47,070 [IPC Server handler 2 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting 
services required for observer state
2019-11-20 15:52:47,077 [IPC Server handler 2 on default port 10022] INFO  
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active 
node every 120 seconds.
2019-11-20 15:52:47,078 [IPC Server handler 3 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping 
services started for standby state
2019-11-20 15:52:47,080 [IPC Server handler 3 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting 
services required for standby state
2019-11-20 15:52:47,084 [IPC Server handler 3 on default port 10022] INFO  
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active 
node every 120 seconds.
2019-11-20 15:52:47,085 [IPC Server handler 2 on default port 
10024-EventThread] INFO  ha.ZKFailoverController 
(ZKFailoverController.java:becomeStandby(496)) - Successfully transitioned 
NameNode at localhost/127.0.0.1:10022 to standby state
2019-11-20 15:52:56,642 [ZKFC Delay timer #0] INFO  ha.ZKFailoverController 
(ZKFailoverController.java:recheckElectability(802)) - STATE is : active
2019-11-20 15:52:56,642 [ZKFC Delay timer #0] INFO  ha.ActiveStandbyElector 
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not 
re-connecting.
2019-11-20 15:52:56,870 [ZKFC Delay timer #0] INFO  ha.ZKFailoverController 
(ZKFailoverController.java:recheckElectability(802)) - STATE is : standby
2019-11-20 15:52:56,871 [ZKFC Delay timer #0] INFO  ha.ActiveStandbyElector 
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not 
re-connecting.
2019-11-20 15:53:12,123 [Time-limited test] INFO  zookeeper.JUnit4ZKTestRunner 
(JUnit4ZKTestRunner.java:evaluate(99)) - TEST METHOD FAILED 
testManualFailoverWithDFSHAAdmin

{noformat}



> Prevent ZKFC changing Observer Namenode state
> ---------------------------------------------
>
>                 Key: HDFS-14961
>                 URL: https://issues.apache.org/jira/browse/HDFS-14961
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Íñigo Goiri
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14961-01.patch, HDFS-14961-02.patch
>
>
> HDFS-14130 made ZKFC aware of the Observer Namenode and hence allows ZKFC 
> running along with the observer NOde.
> The Observer namenode isn't suppose to be part of ZKFC election process.
> But if the  Namenode was part of election, before turning into Observer by 
> transitionToObserver Command. The ZKFC still sends instruction to the 
> Namenode as a result of previous participation and sometimes tend to change 
> the state of Observer to Standby.
> This is also the reason for  failure in TestDFSZKFailoverController.
> TestDFSZKFailoverController has been consistently failing with a time out 
> waiting in testManualFailoverWithDFSHAAdmin(). In particular 
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to