[
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035834#comment-13035834
]
Eli Collins commented on HDFS-1592:
-----------------------------------
Thanks for the info Bharath. I tested on trunk, but also when I filed HDFS-1849
I knew the current code wouldn't tolerate a failed volume. There's an issue
with the 2nd test case:
{quote}
Case 2: One disk failure (/grid/2) and Vol Tolerated = 1. Outcome: BP Service
should not exit
...
11/05/18 08:48:39 WARN datanode.DataNode: Invalid directory in:
dfs.datanode.data.dir:
java.io.FileNotFoundException: File file:/grid/2/testing/hadoop-logs/dfs/data
does not exist.
{quote}
A missing data directory is not a disk failure, the datanode will happily
notice it and recreate the directory successfully.
If you swap out a disk from a host or just make part of the data directory
inaccessible, eg by changing the perms on the host file system, you'll see that
this is a fatal error for the DN, eg
{quote}
11/05/18 15:57:23 FATAL datanode.DataNode:
DatanodeRegistration(localhost.localdomain:50010, storageID=, infoPort=50075,
ipcPort=50020, storageInfo=lv=0;cid=;nsid=0;c=0) initialization failed for
block pool BP-1288327361-127.0.0.1-1305593076974
java.io.IOException: Cannot remove current directory:
/home/eli/hadoop-dirs1/dfs/data1/current
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:332)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.format(DataStorage.java:264)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:166)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:216)
at
org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBPStorage(DataNode.java:797)
at
org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBP(DataNode.java:774)
at
org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1186)
at java.lang.Thread.run(Thread.java:662)
{noformat}
Your four test cases are great. Please write a unit test for each. This way we
can make sure the patch works for each and that future changes don't break this
feature.
> Datanode startup doesn't honor volumes.tolerated
> -------------------------------------------------
>
> Key: HDFS-1592
> URL: https://issues.apache.org/jira/browse/HDFS-1592
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 0.20.204.0
> Reporter: Bharath Mundlapudi
> Assignee: Bharath Mundlapudi
> Fix For: 0.20.204.0, 0.23.0
>
> Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch,
> HDFS-1592-rel20.patch
>
>
> Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira