[
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231760#comment-15231760
]
Lin Yiqun commented on HDFS-10269:
----------------------------------
Hi, [~cnauroth], I don't think the misconfiguration for
dfs.datanode.failed.volumes.tolerated always mean the admin forgot to include a
few volumes. Sometimes the user don't know the value configured for failed
volumes bigger than volumes or just small than 0 will lead the datanode
shutdown. And the property desciption of dfs.datanode.failed.volumes.tolerated
also don't declare for this. It will make users confused and have to look for
the reason for datanode's log and then restart the datanode. It seems we
should do a improvement for this. Giving a suitable value when invalid
configuration hadppend will be better than just node's shutdown.
> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the
> datanode exit
> ------------------------------------------------------------------------------------------
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.1
> Reporter: Lin Yiqun
> Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but
> actually the new cluster only have one datadir path. And this leaded the
> Invalid volume failure config value and threw {{DiskErrorException}}, so the
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage:
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> Block pool <registering> (Datanode Uuid unassigned) service to
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> Block pool <registering> (Datanode Uuid unassigned) service to
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure
> config value: 5
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:281)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Ending block pool service for: Block pool <registering> (Datanode Uuid
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Ending block pool service for: Block pool <registering> (Datanode Uuid
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Removed Block pool <registering> (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value
> incorrectly. Instead of, we can give a warn info for this and reset this
> value to the default value. It will be a better way for this case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)