[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035834#comment-13035834
 ] 

Eli Collins commented on HDFS-1592:
-----------------------------------

Thanks for the info Bharath. I tested on trunk, but also when I filed HDFS-1849 
I knew the current code wouldn't tolerate a failed volume. There's an issue 
with the 2nd test case:

{quote}
Case 2: One disk failure (/grid/2) and Vol Tolerated = 1. Outcome: BP Service 
should not exit
...
11/05/18 08:48:39 WARN datanode.DataNode: Invalid directory in: 
dfs.datanode.data.dir: 
java.io.FileNotFoundException: File file:/grid/2/testing/hadoop-logs/dfs/data 
does not exist.
{quote}

A missing data directory is not a disk failure, the datanode will happily 
notice it and recreate the directory successfully. 

If you swap out a disk from a host or just make part of the data directory 
inaccessible, eg by changing the perms on the host file system, you'll see that 
this is a fatal error for the DN, eg

{quote}
11/05/18 15:57:23 FATAL datanode.DataNode: 
DatanodeRegistration(localhost.localdomain:50010, storageID=, infoPort=50075, 
ipcPort=50020, storageInfo=lv=0;cid=;nsid=0;c=0) initialization failed for 
block pool BP-1288327361-127.0.0.1-1305593076974
java.io.IOException: Cannot remove current directory: 
/home/eli/hadoop-dirs1/dfs/data1/current
        at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:332)
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.format(DataStorage.java:264)
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:166)
        at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:216)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBPStorage(DataNode.java:797)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.setupBP(DataNode.java:774)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1186)
        at java.lang.Thread.run(Thread.java:662)
{noformat}


Your four test cases are great. Please write a unit test for each. This way we 
can make sure the patch works for each and that future changes don't break this 
feature.

> Datanode startup doesn't honor volumes.tolerated 
> -------------------------------------------------
>
>                 Key: HDFS-1592
>                 URL: https://issues.apache.org/jira/browse/HDFS-1592
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.204.0
>            Reporter: Bharath Mundlapudi
>            Assignee: Bharath Mundlapudi
>             Fix For: 0.20.204.0, 0.23.0
>
>         Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, 
> HDFS-1592-rel20.patch
>
>
> Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to