[jira] [Commented] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode

Ravi Prakash (Commented) (JIRA) Fri, 06 Apr 2012 15:08:38 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248924#comment-13248924
 ]


Ravi Prakash commented on HDFS-1940:
------------------------------------

I checked in 0.23.3 (d60e9678bbc4d52fb9ab5d65363d452cc5926cff) and copying a 
block (and meta) file from one disk to another does not show up in fsck. The 
DirectoryScanner does detect the extra block which is not present in the memory 
map e.g. (here I had only 1 file = 1 block in HDFS and then made a copy)
{noformat}2012-04-06 15:49:50,958 INFO  datanode.DirectoryScanner 
(DirectoryScanner.java:scan(389)) - BlockPool 
BP-1909597932-10.74.90.105-1333745027872 Total blocks: 2, missing metadata 
files:0, missing block files:0, missing blocks in memory:1, mismatched 
blocks:0{noformat}

When I deleted the file from HDFS, I did see that the copied block (rather than 
the original block) got deleted. I retried this experiment, and corrupted the 
original block, restarted HDFS. On cat-ing the file, the original uncorrupted 
data was displayed from the copied block. When I corrupted the copied block, 
and restarted HDFS, it was not able to serve the data from the uncorrupted 
original block. This is a bummer. 
                
> Datanode can have more than one copy of same block when a failed disk is 
> coming back in datanode
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1940
>                 URL: https://issues.apache.org/jira/browse/HDFS-1940
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.204.0
>            Reporter: Rajit Saha
>            Assignee: Bharath Mundlapudi
>
> There is a situation where one datanode can have more than one copy of same 
> block due to a disk fails and comes back after sometime in a datanode. And 
> these duplicate blocks are not getting deleted even after datanode and 
> namenode restart.
> This situation can only happen in a corner case , when due to disk failure, 
> the data block is replicated to other disk of the same datanode.
> To simulate this scenario I copied a datablock and the associated .meta file 
> from one disk to another disk of same datanode, so the datanode is having 2 
> copy of same replica. Now I restarted datanode and namenode. Still the extra 
> data block and meta file is not deleted from the datanode
> ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*`
> -rw-r--r-- 1 hdfs users 7814 May 13 21:05 
> /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376
> -rw-r--r-- 1 hdfs users   71 May 13 21:05 
> /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta
> -rw-r--r-- 1 hdfs users 7814 May 13 21:14 
> /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376
> -rw-r--r-- 1 hdfs users   71 May 13 21:14 
> /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode

Reply via email to