[ https://issues.apache.org/jira/browse/HDFS-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jimmy Xiang resolved HDFS-4442. ------------------------------- Resolution: Duplicate Assignee: Jimmy Xiang Yes, that's a duplicate of HDFS-4201. Let me close this one and work on HDFS-4201. > Initialization failed for block (...) Invalid volume failure config value: 1 > ----------------------------------------------------------------------------- > > Key: HDFS-4442 > URL: https://issues.apache.org/jira/browse/HDFS-4442 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: Amazon Linux (Centos 6), Cloudera nightly RPMs > Reporter: Alexandre Fouché > Assignee: Jimmy Xiang > Labels: datanode, hdfs > > (Note: Some of the message are similar to HDFS-4201) > Just after i created a new HDFS cluster, and this time using Cloudera nightly > RPM hadoop-hdfs-datanode-2.0.0+898-1.cdh4.2.0.p0.939.el6.x86_64, HDFS > datanodes were unable to initialize or store anything. It stays alive, but > keeps logging exceptions every few seconds. > It was "Initialization failed for block pool Block pool (...)" > "org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume > failure config value: 1" and then repeatedly "Exception in BPOfferService > for Block pool (...)" > My config was : > <property> > <name>dfs.datanode.data.dir</name> > <value>file:///opt/hadoop/dn1/data</value> > </property> > After a bit of tweaking, it worked once i added a second EBS volume to the > node. Yet it does not explain the initial error. A bug ? > <property> > <name>dfs.datanode.data.dir</name> > <value>file:///opt/hadoop/dn1/data,file:///opt/hadoop/dn2/data</value> > </property> > Original exceptions: > {code} > (...) > 2013-01-25 15:04:28,573 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting block pool BP-1342054845-10.118.50.25-1359125000145 directory > /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current > 2013-01-25 15:04:28,581 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145 > 2013-01-25 15:04:28,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-404982471-10.194.189.193-50010-1359126268221) service to > namenode2.somedomain.com/10.2.118.169:8020 beginning handshake with NN > 2013-01-25 15:04:28,605 FATAL > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-404982471-10.194.189.193-50010-1359126268221) service to > namenode1.somedomain.com/10.118.50.25:8020 > org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure > config value: 1 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:182) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:910) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:872) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660) > at java.lang.Thread.run(Unknown Source) > 2013-01-25 15:04:28,702 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-404982471-10.194.189.193-50010-1359126268221) service to > namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN > 2013-01-25 15:04:28,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > For namenode namenode2.somedomain.com/10.2.118.169:8020 using > DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of 21600000msec > Initial delay: 0msec; heartBeatInterval=3000 > 2013-01-25 15:04:28,864 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService > for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-404982471-10.194.189.193-50010-1359126268221) service to > namenode2.somedomain.com/10.2.118.169:8020 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673) > at java.lang.Thread.run(Unknown Source) > 2013-01-25 15:04:28,864 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Ending block pool service for: Block pool > BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-404982471-10.194.189.193-50010-1359126268221) service to > namenode1.somedomain.com/10.118.50.25:8020 > 2013-01-25 15:04:33,864 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > For namenode namenode2.somedomain.com/10.2.118.169:8020 using > DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of 21600000msec > Initial delay: 0msec; heartBeatInterval=3000 > 2013-01-25 15:04:33,864 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService > for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-404982471-10.194.189.193-50010-1359126268221) service to > namenode2.somedomain.com/10.2.118.169:8020 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673) > at java.lang.Thread.run(Unknown Source) > (...) > {code} > After adding the second volume (and deleted the first one too): > {code} > (...) > 2013-01-25 17:29:18,394 INFO org.apache.hadoop.hdfs.server.common.Storage: > Lock on /opt/hadoop/dn1/data/in_use.lock acquired by nodename > 22...@datanode1.somedomain.com > 2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory /opt/hadoop/dn1/data is not formatted > 2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: > Lock on /opt/hadoop/dn2/data/in_use.lock acquired by nodename > 22...@datanode1.somedomain.com > 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory /opt/hadoop/dn2/data is not formatted > 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2013-01-25 17:29:19,509 INFO org.apache.hadoop.hdfs.server.common.Storage: > Locking is disabled > 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory > /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145 is not > formatted. > 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting block pool BP-1342054845-10.118.50.25-1359125000145 directory > /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current > 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: > Locking is disabled > 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: > Storage directory > /opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145 is not > formatted. > 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting ... > 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: > Formatting block pool BP-1342054845-10.118.50.25-1359125000145 directory > /opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145/current > 2013-01-25 17:29:19,527 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Setting up storage: > nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145 > 2013-01-25 17:29:19,548 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > volume - /opt/hadoop/dn1/data/current > 2013-01-25 17:29:19,640 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > volume - /opt/hadoop/dn2/data/current > 2013-01-25 17:29:19,645 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Registered FSDatasetState MBean > 2013-01-25 17:29:19,655 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory > Tree Verification scan starting at 1359139242655 with interval 21600000 > 2013-01-25 17:29:19,656 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding > block pool BP-1342054845-10.118.50.25-1359125000145 > 2013-01-25 17:29:19,674 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-26684760-10.194.189.193-50010-1359134958661) service to > namenode1.somedomain.com/10.118.50.25:8020 beginning handshake with NN > 2013-01-25 17:29:19,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-26684760-10.194.189.193-50010-1359134958661) service to > namenode2.somedomain.com/10.2.118.169:8020 beginning handshake with NN > 2013-01-25 17:29:19,688 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-26684760-10.194.189.193-50010-1359134958661) service to > namenode1.somedomain.com/10.118.50.25:8020 successfully registered with NN > 2013-01-25 17:29:19,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > For namenode namenode1.somedomain.com/10.118.50.25:8020 using > DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of 21600000msec > Initial delay: 0msec; heartBeatInterval=3000 > 2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-26684760-10.194.189.193-50010-1359134958661) service to > namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN > 2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > For namenode namenode2.somedomain.com/10.2.118.169:8020 using > DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of 21600000msec > Initial delay: 0msec; heartBeatInterval=3000 > 2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Namenode Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-26684760-10.194.189.193-50010-1359134958661) service to > namenode1.somedomain.com/10.118.50.25:8020 trying to claim ACTIVE state with > txid=165 > 2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Acknowledging ACTIVE Namenode Block pool > BP-1342054845-10.118.50.25-1359125000145 (storage id > DS-26684760-10.194.189.193-50010-1359134958661) service to > namenode1.somedomain.com/10.118.50.25:8020 > 2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > BlockReport of 0 blocks took 0 msec to generate and 39 msecs for RPC and NN > processing > 2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > sent block report, processed command:null > 2013-01-25 17:29:19,935 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block > Verification Scanner initialized with interval 504 hours for block pool > BP-1342054845-10.118.50.25-1359125000145 > 2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > BlockReport of 0 blocks took 1 msec to generate and 50 msecs for RPC and NN > processing > 2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > sent block report, processed > command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@11082823 > 2013-01-25 17:29:20,033 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added > bpid=BP-1342054845-10.118.50.25-1359125000145 to blockPoolScannerMap, new > size=1 > (...) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)