[
https://issues.apache.org/jira/browse/HDFS-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen Chu updated HDFS-4281:
------------------------------
Description:
On a shut down HA cluster, I ran "hdfs namenode -recover" and encountered:
{code}
You have selected Metadata Recovery mode. This mode is intended to recover
lost metadata on a corrupt filesystem. Metadata recovery mode often
permanently deletes data from your HDFS filesystem. Please back up\
your edit log and fsimage before trying this!
Are you ready to proceed? (Y/N)
(Y or N) Y
12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: starting recovery...
12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI
in configuration files. Please update hdfs configuration.
12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI
in configuration files. Please update hdfs configuration.
12/12/05 16:43:48 WARN namenode.FSNamesystem: Only one image storage directory
(dfs.namenode.name.dir) configured. Beware of dataloss due to lack of redundant
storage directories!
12/12/05 16:43:48 INFO util.HostsFileReader: Refreshing hosts (include/exclude)
list
12/12/05 16:43:48 INFO blockmanagement.DatanodeManager:
dfs.block.invalidate.limit=1000
12/12/05 16:43:48 INFO blockmanagement.BlockManager:
dfs.block.access.token.enable=true
12/12/05 16:43:48 INFO blockmanagement.BlockManager:
dfs.block.access.key.update.interval=600 min(s),
dfs.block.access.token.lifetime=600 min(s),
dfs.encrypt.data.transfer.algorithm=null
12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught
exception
java.lang.IllegalStateException: Could not determine own NN ID in namespace
'ha-nn-uri'. Please ensure that this node is one of the machines listed as an
NN RPC address, or configure dfs.ha.namenode.id
at
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at
org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
12/12/05 16:43:48 FATAL namenode.NameNode: Exception in namenode join
java.lang.IllegalStateException: Could not determine own NN ID in namespace
'ha-nn-uri'. Please ensure that this node is one of the machines listed as an
NN RPC address, or configure dfs.ha.namenode.id
at
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at
org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
12/12/05 16:43:48 INFO util.ExitUtil: Exiting with status 1
12/12/05 16:43:48 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at
cs-10-20-193-228.cloud.cloudera.com/10.20.193.228
************************************************************/
{code}
The exception message says
{code}
Please ensure that this node is one of the machines listed as an NN RPC
address, or configure dfs.ha.namenode.id
{code}
I ran the recover command from a machine listed as an NN RPC:
{code}
<property>
<name>dfs.namenode.rpc-address.ha-nn-uri.nn1</name>
<value>cs-10-20-193-228.cloud.cloudera.com:17020</value>
</property>
{code}
Setting dfs.ha.namenode.id allows me to proceed. If we always need to specify
the dfs.ha.namenode.id, then we can edit the exception message.
was:
On a shut down HA cluster, I ran "hdfs namenode -recover" and encountered:
{code}
bash-4.1$ hdfs namenode -recover
12/12/05 16:43:47 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = cs-10-20-193-228.cloud.cloudera.com/10.20.193.228
STARTUP_MSG: args = [-recover]
STARTUP_MSG: version = 2.0.0-cdh4.1.2
STARTUP_MSG: classpath =
/etc/hadoop/conf:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-lang-2\
.5.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jackson-mappe\
r-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.2.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/l\
ib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jun\
it-4.8.2.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr\
/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/\
usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoo\
p/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/jaspe\
r-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/avro-1\
.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/l\
ib/jsp-api-2.1.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.2.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.2.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoo\
p/.//hadoop-common-2.0.0-cdh4.1.2-tests.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.2.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr\
/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib\
/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commo\
ns-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hado\
op-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/\
usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jsp\
-api-2.1.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.2.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.2-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0\
.20-mapreduce/.//*
STARTUP_MSG: build =
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.2/src/hadoop-common-project/hadoop-common
-r f0b53c81cbf56f5955e403b49fcd27afd5f082de; compiled \
by 'jenkins' on Thu Nov 1 17:33:23 PDT 2012
************************************************************/
You have selected Metadata Recovery mode. This mode is intended to recover
lost metadata on a corrupt filesystem. Metadata recovery mode often
permanently deletes data from your HDFS filesystem. Please back up\
your edit log and fsimage before trying this!
Are you ready to proceed? (Y/N)
(Y or N) Y
12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: starting recovery...
12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI
in configuration files. Please update hdfs configuration.
12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI
in configuration files. Please update hdfs configuration.
12/12/05 16:43:48 WARN namenode.FSNamesystem: Only one image storage directory
(dfs.namenode.name.dir) configured. Beware of dataloss due to lack of redundant
storage directories!
12/12/05 16:43:48 INFO util.HostsFileReader: Refreshing hosts (include/exclude)
list
12/12/05 16:43:48 INFO blockmanagement.DatanodeManager:
dfs.block.invalidate.limit=1000
12/12/05 16:43:48 INFO blockmanagement.BlockManager:
dfs.block.access.token.enable=true
12/12/05 16:43:48 INFO blockmanagement.BlockManager:
dfs.block.access.key.update.interval=600 min(s),
dfs.block.access.token.lifetime=600 min(s),
dfs.encrypt.data.transfer.algorithm=null
12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught
exception
java.lang.IllegalStateException: Could not determine own NN ID in namespace
'ha-nn-uri'. Please ensure that this node is one of the machines listed as an
NN RPC address, or configure dfs.ha.namenode.id
at
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at
org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
12/12/05 16:43:48 FATAL namenode.NameNode: Exception in namenode join
java.lang.IllegalStateException: Could not determine own NN ID in namespace
'ha-nn-uri'. Please ensure that this node is one of the machines listed as an
NN RPC address, or configure dfs.ha.namenode.id
at
com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at
org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323)
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
12/12/05 16:43:48 INFO util.ExitUtil: Exiting with status 1
12/12/05 16:43:48 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at
cs-10-20-193-228.cloud.cloudera.com/10.20.193.228
************************************************************/
{code}
The exception message says
{code}
Please ensure that this node is one of the machines listed as an NN RPC
address, or configure dfs.ha.namenode.id
{code}
I ran the recover command from a machine listed as an NN RPC:
{code}
<property>
<name>dfs.namenode.rpc-address.ha-nn-uri.nn1</name>
<value>cs-10-20-193-228.cloud.cloudera.com:17020</value>
</property>
{code}
Setting dfs.ha.namenode.id allows me to proceed. If we always need to specify
the dfs.ha.namenode.id, then we can edit the exception message.
> NameNode recovery does not detect NN RPC address on HA cluster
> --------------------------------------------------------------
>
> Key: HDFS-4281
> URL: https://issues.apache.org/jira/browse/HDFS-4281
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.0.0-alpha
> Reporter: Stephen Chu
> Attachments: core-site.xml, hdfs-site.xml, nn_recover
>
>
> On a shut down HA cluster, I ran "hdfs namenode -recover" and encountered:
> {code}
> You have selected Metadata Recovery mode. This mode is intended to recover
> lost metadata on a corrupt filesystem. Metadata recovery mode often
> permanently deletes data from your HDFS filesystem. Please back up\
> your edit log and fsimage before trying this!
> Are you ready to proceed? (Y/N)
> (Y or N) Y
> 12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: starting recovery...
> 12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI
> in configuration files. Please update hdfs configuration.
> 12/12/05 16:43:48 WARN common.Util: Path /dfs/nn should be specified as a URI
> in configuration files. Please update hdfs configuration.
> 12/12/05 16:43:48 WARN namenode.FSNamesystem: Only one image storage
> directory (dfs.namenode.name.dir) configured. Beware of dataloss due to lack
> of redundant storage directories!
> 12/12/05 16:43:48 INFO util.HostsFileReader: Refreshing hosts
> (include/exclude) list
> 12/12/05 16:43:48 INFO blockmanagement.DatanodeManager:
> dfs.block.invalidate.limit=1000
> 12/12/05 16:43:48 INFO blockmanagement.BlockManager:
> dfs.block.access.token.enable=true
> 12/12/05 16:43:48 INFO blockmanagement.BlockManager:
> dfs.block.access.key.update.interval=600 min(s),
> dfs.block.access.token.lifetime=600 min(s),
> dfs.encrypt.data.transfer.algorithm=null
> 12/12/05 16:43:48 INFO namenode.MetaRecoveryContext: RECOVERY FAILED: caught
> exception
> java.lang.IllegalStateException: Could not determine own NN ID in namespace
> 'ha-nn-uri'. Please ensure that this node is one of the machines listed as an
> NN RPC address, or configure dfs.ha.namenode.id
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:172)
> at
> org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> 12/12/05 16:43:48 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalStateException: Could not determine own NN ID in namespace
> 'ha-nn-uri'. Please ensure that this node is one of the machines listed as an
> NN RPC address, or configure dfs.ha.namenode.id
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:172)
> at
> org.apache.hadoop.hdfs.HAUtil.getNameNodeIdOfOtherNode(HAUtil.java:155)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createBlockTokenSecretManager(BlockManager.java:323)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:239)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:451)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:416)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1063)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> 12/12/05 16:43:48 INFO util.ExitUtil: Exiting with status 1
> 12/12/05 16:43:48 INFO namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at
> cs-10-20-193-228.cloud.cloudera.com/10.20.193.228
> ************************************************************/
> {code}
> The exception message says
> {code}
> Please ensure that this node is one of the machines listed as an NN RPC
> address, or configure dfs.ha.namenode.id
> {code}
> I ran the recover command from a machine listed as an NN RPC:
> {code}
> <property>
> <name>dfs.namenode.rpc-address.ha-nn-uri.nn1</name>
> <value>cs-10-20-193-228.cloud.cloudera.com:17020</value>
> </property>
> {code}
> Setting dfs.ha.namenode.id allows me to proceed. If we always need to specify
> the dfs.ha.namenode.id, then we can edit the exception message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira