[ https://issues.apache.org/jira/browse/HDFS-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377866#comment-17377866 ]
Denis Serduik commented on HDFS-4957: ------------------------------------- We've got beaten by exact same scenario. 3 Journal nodes and 2 of them are collocated with NNs. See logs bellow: {noformat} 2021-07-09 06:34:53,932 INFO namenode.RedundantEditLogInputStream: Fast-forwarding stream 'http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true' to transaction ID 449 2021-07-09 06:34:53,970 INFO namenode.FSImage: Edits file http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true, http://YYY-YYYYY-YYYY.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true of size 42 edits # 2 loaded in 0 seconds 2021-07-09 06:36:22,127 INFO namenode.FSNamesystem: Stopping services started for standby state 2021-07-09 06:36:22,128 WARN ha.EditLogTailer: Edit log tailer interrupted java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:469) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:399) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:416) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:412) 2021-07-09 06:36:22,130 INFO namenode.FSNamesystem: Starting services required for active state 2021-07-09 06:36:22,131 ERROR namenode.NameNode: Error encountered requiring NN shutdown. Shutting down immediately. java.lang.IllegalArgumentException: Unable to construct journal, qjournal://XX-XXXXX-XXX.internal.cloudapp.net:8485;YYY-YYYYY-YYYY.internal.cloudapp.net:8485;ZZ-ZZZZZ-ZZZZ.internal.cloudapp.net:8485/conduit at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1824) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:294) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:259) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1223) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1890) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1749) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1742) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1811) ... 19 more Caused by: java.net.UnknownHostException: XX-XXXXX-XXX.internal.cloudapp.net:8485 at org.apache.hadoop.hdfs.server.common.Util.getAddressesList(Util.java:378) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:388) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:170) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:126) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105) ... 24 more 2021-07-09 06:36:22,132 INFO util.ExitUtil: Exiting with status 1: java.lang.IllegalArgumentException: Unable to construct journal, qjournal://XX-XXXXX-XXX.internal.cloudapp.net:8485;YYY-YYYYY-YYYY.internal.cloudapp.net:8485;ZZ-ZZZZZ-ZZZZ.internal.cloudapp.net:8485/conduit 2021-07-09 06:36:22,133 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at YYY-YYYYY-YYYY/10.9.0.5 ************************************************************/ WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX. 2021-07-09 06:36:22,837 INFO namenode.NameNode: STARTUP_MSG: {noformat} > NameNode failover should not fail because a DNS entry for a quorum node > cannot be resolved > ------------------------------------------------------------------------------------------ > > Key: HDFS-4957 > URL: https://issues.apache.org/jira/browse/HDFS-4957 > Project: Hadoop HDFS > Issue Type: Bug > Components: qjm > Affects Versions: 2.3.0, 2.6.0 > Reporter: Colin McCabe > Assignee: John Zhuge > Priority: Major > > When a StandbyNameNode is becoming active, we should not bail out because a > DNS entry for a quorum node cannot be resolved. Currently it does fail in > this scenario, with a message like this: > {code} > 2013-07-03 21:28:40,576 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services > required for active state > 2013-07-03 21:28:40,579 FATAL > org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring > NN shutdown. Shutting down immediately. > java.lang.IllegalArgumentException: Unable to construct journal, > qjournal://hadoop-mm:8485;hadoop-nn-0:8485;hadoop-nn-1:8485/hadoop > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1254) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:722) > <etc> > {code} > reported by Matt Bookman -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org