[jira] [Commented] (HDFS-9819) FsVolume should tolerate few times check-dir failed due to deletion by mistake

Lin Yiqun (JIRA) Wed, 17 Feb 2016 17:46:54 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151562#comment-15151562
 ]


Lin Yiqun commented on HDFS-9819:
---------------------------------

Thanks [~kihwal] for comments. There are two scenarios that will cause dir/file 
deletion.
* When I browser bpDir or its child file in one data dir, and execute the rm 
command accidently in this dir.
* When dfsUsage of one disk more than 90%, and cause the disk-space alarm for 
my node. And then I should delete block files right now to make available space 
more instead of slowly waitting balancer to move blocks to the other nodes. The 
deleted file blocks will be replicated again in other nodes, so the blocks will 
not missing.
These two scenarios will both cause data dir deletion and let volume failed.

> FsVolume should tolerate few times check-dir failed due to deletion by mistake
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-9819
>                 URL: https://issues.apache.org/jira/browse/HDFS-9819
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Lin Yiqun
>            Assignee: Lin Yiqun
>             Fix For: 2.7.1
>
>         Attachments: HDFS-9819.001.patch
>
>
> FsVolume should tolerate few times check-dir failed because sometimes we will 
> do a delete dir/file operation by mistake in datanode data-dirs. Then the 
> {{DataNode#startCheckDiskErrorThread}} will invoking checkDir method 
> periodicity and find dir not existed, throw exception. The checked volume 
> will be added to failed volume list. The blocks on this volume will be 
> replicated again. But actually, this is not needed to do. We should let 
> volume can be tolerated few times check-dir failed like config 
> {{dfs.datanode.failed.volumes.tolerated}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9819) FsVolume should tolerate few times check-dir failed due to deletion by mistake

Reply via email to