[
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699515#comment-13699515
]
Matt Bookman commented on HDFS-4210:
------------------------------------
Would appreciate if the priority of this issue was revisited as this problem
can surface in a much more important situation, namely during failover from
Active NameNode to Standby NameNode.
In some virtualized environments, when a virtual machine instance goes away,
the DNS entry for it goes away immediately as well.
Thus the following is observed on the Standby NameNode (hadoop-2.0.5-alpha)
during failure of the Active NameNode:
2013-07-03 21:28:40,576 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required
for active state
2013-07-03 21:28:40,579 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:
Error encountered requiring NN shutdown. Shutting down immediately.
java.lang.IllegalArgumentException: Unable to construct journal,
qjournal://hadoop-mm:8485;hadoop-nn-0:8485;hadoop-nn-1:8485/hadoop
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1254)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:722)
<etc>
One can work around this problem by dropping an /etc/hosts file onto the
Standby NameNode. Doing so allows the failover to proceed.
But this creates a maintenance hassle, and it is not clear why a DNS failure
should be treated any differently here than the general inability to connect to
one of the quorum journal nodes.
> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -----------------------------------------------------------------------------
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, journal-node, namenode
> Affects Versions: 2.0.0-alpha
> Environment: CDH4.1.2
> Reporter: Damien Hardy
> Priority: Trivial
>
> Setting :
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
> cdh4master01 and cdh4master02 JournalNode up and running,
> cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
> 12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal,
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233)
> ... 5 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:161)
> at
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141)
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353)
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135)
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:104)
> at
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:93)
> ... 10 more
> I suggest that if quorum is up format should not fails.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira