[jira] [Commented] (HDFS-14961) TestDFSZKFailoverController fails consistently

Ayush Saxena (Jira) Thu, 07 Nov 2019 10:21:24 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969480#comment-16969480
 ]


Ayush Saxena commented on HDFS-14961:
-------------------------------------

Well I was able to reproduce this : 
Sharing the log part which I suspect could be the reason, (It directs it was 
able to transition to Observer, then moved to standby)

{noformat}
You have specified the --forcemanual flag. This flag is dangerous, as it can 
induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly 
irrecoverably.

It is recommended not to use this flag, but instead to shut down the cluster 
and disable automatic failover if you prefer to manually manage your HA state.

You may abort safely by answering 'n' or hitting ^C now.

Are you sure you want to continue? (Y or N) 2019-11-07 23:43:24,936 [Listener 
at localhost/10022] WARN  ha.HAAdmin 
(HAAdmin.java:checkManualStateManagementOK(269)) - Proceeding with manual HA 
state management even though
automatic failover is enabled for NameNode at localhost/127.0.0.1:10022
2019-11-07 23:43:24,940 [IPC Server handler 5 on default port 10022] WARN  
namenode.NameNode (NameNode.java:checkHaStateChange(2106)) - Allowing manual HA 
control from 127.0.0.1 even though automatic HA is enabled, because the user 
specified the force flag
2019-11-07 23:43:24,941 [IPC Server handler 5 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping 
services started for standby state
2019-11-07 23:43:24,941 [Edit log tailer] WARN  ha.EditLogTailer 
(EditLogTailer.java:doWork(528)) - Edit log tailer interrupted: sleep 
interrupted
2019-11-07 23:43:24,941 [IPC Server handler 5 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting 
services required for observer state
2019-11-07 23:43:24,944 [IPC Server handler 5 on default port 10022] INFO  
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active 
node every 120 seconds.
2019-11-07 23:43:24,955 [IPC Server handler 2 on default port 
10024-EventThread] INFO  ha.ZKFailoverController 
(ZKFailoverController.java:becomeStandby(491)) - ZK Election indicated that 
NameNode at localhost/127.0.0.1:10022 should become standby
2019-11-07 23:43:24,964 [IPC Server handler 9 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping 
services started for standby state
2019-11-07 23:43:24,966 [Edit log tailer] WARN  ha.EditLogTailer 
(EditLogTailer.java:doWork(528)) - Edit log tailer interrupted: sleep 
interrupted
2019-11-07 23:43:24,966 [IPC Server handler 9 on default port 10022] INFO  
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting 
services required for standby state
2019-11-07 23:43:24,971 [IPC Server handler 9 on default port 10022] INFO  
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active 
node every 120 seconds.
2019-11-07 23:43:24,972 [IPC Server handler 2 on default port 
10024-EventThread] INFO  ha.ZKFailoverController 
(ZKFailoverController.java:becomeStandby(496)) - Successfully transitioned 
NameNode at localhost/127.0.0.1:10022 to standby state
2019-11-07 23:43:34,499 [ZKFC Delay timer #0] INFO  ha.ActiveStandbyElector 
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not 
re-connecting.
2019-11-07 23:43:34,728 [ZKFC Delay timer #0] INFO  ha.ActiveStandbyElector 
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not 
re-connecting.
2019-11-07 23:43:50,021 [Listener at localhost/10022] INFO  
zookeeper.JUnit4ZKTestRunner (JUnit4ZKTestRunner.java:evaluate(99)) - TEST 
METHOD FAILED testManualFailoverWithDFSHAAdmin
{noformat}


> TestDFSZKFailoverController fails consistently
> ----------------------------------------------
>
>                 Key: HDFS-14961
>                 URL: https://issues.apache.org/jira/browse/HDFS-14961
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Íñigo Goiri
>            Priority: Major
>
> TestDFSZKFailoverController has been consistently failing with a time out 
> waiting in testManualFailoverWithDFSHAAdmin(). In particular 
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14961) TestDFSZKFailoverController fails consistently

Reply via email to