[ 
https://issues.apache.org/jira/browse/HDFS-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562920#comment-13562920
 ] 

Harsh J commented on HDFS-4442:
-------------------------------

The error can certainly be improved but what its telling you is that you have 1 
disk configured but you also have {{dfs.datanode.failed.volumes.tolerated}} set 
to 1. Toleration value of disk failures should be strictly less than the total 
number of configured {{dfs.datanode.data.dir}} values, for a DN cannot run 
disk-less.

This isn't a bug, but we'd gladly accept a patch to improve the error message. 
Would you be interested in submitting one yourself? We have steps outlined for 
Apache Hadoop contributions at http://wiki.apache.org/hadoop/HowToContribute, 
and you can ping the dev lists if you have further questions in the process.

Like Suresh has already pointed it out to you, in future, make sure to file 
JIRAs on Apache JIRA here only if you are sure it is also an Apache Hadoop 
release issue and not a CDH-specfic issue. For CDH specific issues, you should 
file a JIRA on their own bug trackers (link: 
https://issues.cloudera.org/browse/DISTRO) or ping its lists for clarification.
                
> Initialization failed for block (...) Invalid volume failure  config value: 1
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-4442
>                 URL: https://issues.apache.org/jira/browse/HDFS-4442
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>         Environment: Amazon Linux (Centos 6), Cloudera nightly RPMs
>            Reporter: Alexandre Fouché
>              Labels: datanode, hdfs
>
> (Note: Some of the message are similar to HDFS-4201)
> Just after i created a new HDFS cluster, and this time using Cloudera nightly 
> RPM hadoop-hdfs-datanode-2.0.0+898-1.cdh4.2.0.p0.939.el6.x86_64, HDFS 
> datanodes were unable to initialize or store anything. It stays alive, but 
> keeps logging exceptions every few seconds.
> It was "Initialization failed for block pool Block pool (...)" 
> "org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume 
> failure  config value: 1" and then repeatedly "Exception in BPOfferService 
> for Block pool (...)"
> My config was :
> <property>
>  <name>dfs.datanode.data.dir</name>
>  <value>file:///opt/hadoop/dn1/data</value>
> </property>
> After a bit of tweaking, it worked once i added a second EBS volume to the 
> node. Yet it does not explain the initial error. A bug ?
> <property>
>  <name>dfs.datanode.data.dir</name>
>  <value>file:///opt/hadoop/dn1/data,file:///opt/hadoop/dn2/data</value>
> </property>
> Original exceptions:
> (...)
> 2013-01-25 15:04:28,573 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-1342054845-10.118.50.25-1359125000145 directory 
> /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current
> 2013-01-25 15:04:28,581 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 15:04:28,601 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-404982471-10.194.189.193-50010-1359126268221) service to 
> namenode2.somedomain.com/10.2.118.169:8020 beginning handshake with NN
> 2013-01-25 15:04:28,605 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-404982471-10.194.189.193-50010-1359126268221) service to 
> namenode1.somedomain.com/10.118.50.25:8020
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 1
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:182)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:910)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:872)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:308)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
>     at java.lang.Thread.run(Unknown Source)
> 2013-01-25 15:04:28,702 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-404982471-10.194.189.193-50010-1359126268221) service to 
> namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN
> 2013-01-25 15:04:28,863 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> For namenode namenode2.somedomain.com/10.2.118.169:8020 using 
> DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec 
> Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 15:04:28,864 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
> for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-404982471-10.194.189.193-50010-1359126268221) service to 
> namenode2.somedomain.com/10.2.118.169:8020
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
>     at java.lang.Thread.run(Unknown Source)
> 2013-01-25 15:04:28,864 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool 
> BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-404982471-10.194.189.193-50010-1359126268221) service to 
> namenode1.somedomain.com/10.118.50.25:8020
> 2013-01-25 15:04:33,864 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> For namenode namenode2.somedomain.com/10.2.118.169:8020 using 
> DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec 
> Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 15:04:33,864 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService 
> for Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-404982471-10.194.189.193-50010-1359126268221) service to 
> namenode2.somedomain.com/10.2.118.169:8020
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:435)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:521)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673)
>     at java.lang.Thread.run(Unknown Source)
> (...)
> After adding the second volume (and deleted the first one too):
> (...)
> 2013-01-25 17:29:18,394 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /opt/hadoop/dn1/data/in_use.lock acquired by nodename 
> 22...@datanode1.somedomain.com
> 2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /opt/hadoop/dn1/data is not formatted
> 2013-01-25 17:29:18,396 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Lock on /opt/hadoop/dn2/data/in_use.lock acquired by nodename 
> 22...@datanode1.somedomain.com
> 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory /opt/hadoop/dn2/data is not formatted
> 2013-01-25 17:29:18,646 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2013-01-25 17:29:19,509 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled
> 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory 
> /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145 is not 
> formatted.
> 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2013-01-25 17:29:19,510 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-1342054845-10.118.50.25-1359125000145 directory 
> /opt/hadoop/dn1/data/current/BP-1342054845-10.118.50.25-1359125000145/current
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Locking is disabled
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Storage directory 
> /opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145 is not 
> formatted.
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting ...
> 2013-01-25 17:29:19,519 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Formatting block pool BP-1342054845-10.118.50.25-1359125000145 directory 
> /opt/hadoop/dn2/data/current/BP-1342054845-10.118.50.25-1359125000145/current
> 2013-01-25 17:29:19,527 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Setting up storage: 
> nsid=1786416716;bpid=BP-1342054845-10.118.50.25-1359125000145;lv=-40;nsInfo=lv=-40;cid=CID-3c2cfe5f-da56-4115-90db-81e06c14bc50;nsid=1786416716;c=0;bpid=BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 17:29:19,548 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
> volume - /opt/hadoop/dn1/data/current
> 2013-01-25 17:29:19,640 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
> volume - /opt/hadoop/dn2/data/current
> 2013-01-25 17:29:19,645 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Registered FSDatasetState MBean
> 2013-01-25 17:29:19,655 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory 
> Tree Verification scan starting at 1359139242655 with interval 21600000
> 2013-01-25 17:29:19,656 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
> block pool BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 17:29:19,674 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-26684760-10.194.189.193-50010-1359134958661) service to 
> namenode1.somedomain.com/10.118.50.25:8020 beginning handshake with NN
> 2013-01-25 17:29:19,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-26684760-10.194.189.193-50010-1359134958661) service to 
> namenode2.somedomain.com/10.2.118.169:8020 beginning handshake with NN
> 2013-01-25 17:29:19,688 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-26684760-10.194.189.193-50010-1359134958661) service to 
> namenode1.somedomain.com/10.118.50.25:8020 successfully registered with NN
> 2013-01-25 17:29:19,690 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> For namenode namenode1.somedomain.com/10.118.50.25:8020 using 
> DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec 
> Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-26684760-10.194.189.193-50010-1359134958661) service to 
> namenode2.somedomain.com/10.2.118.169:8020 successfully registered with NN
> 2013-01-25 17:29:19,791 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> For namenode namenode2.somedomain.com/10.2.118.169:8020 using 
> DELETEREPORT_INTERVAL of 300000 msec  BLOCKREPORT_INTERVAL of 21600000msec 
> Initial delay: 0msec; heartBeatInterval=3000
> 2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Namenode Block pool BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-26684760-10.194.189.193-50010-1359134958661) service to 
> namenode1.somedomain.com/10.118.50.25:8020 trying to claim ACTIVE state with 
> txid=165
> 2013-01-25 17:29:19,888 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Acknowledging ACTIVE Namenode Block pool 
> BP-1342054845-10.118.50.25-1359125000145 (storage id 
> DS-26684760-10.194.189.193-50010-1359134958661) service to 
> namenode1.somedomain.com/10.118.50.25:8020
> 2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> BlockReport of 0 blocks took 0 msec to generate and 39 msecs for RPC and NN 
> processing
> 2013-01-25 17:29:19,932 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> sent block report, processed command:null
> 2013-01-25 17:29:19,935 INFO 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block 
> Verification Scanner initialized with interval 504 hours for block pool 
> BP-1342054845-10.118.50.25-1359125000145
> 2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> BlockReport of 0 blocks took 1 msec to generate and 50 msecs for RPC and NN 
> processing
> 2013-01-25 17:29:19,939 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> sent block report, processed 
> command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@11082823
> 2013-01-25 17:29:20,033 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added 
> bpid=BP-1342054845-10.118.50.25-1359125000145 to blockPoolScannerMap, new 
> size=1
> (...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to