[jira] Issue Comment Edited: (HADOOP-2065) Replication policy for corrupted block

lohit vijayarenu (JIRA) Fri, 25 Apr 2008 02:21:29 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592310#action_12592310
 ]


lohit edited comment on HADOOP-2065 at 4/25/08 2:16 AM:
-------------------------------------------------------------------

This patch is based on approach discussed above.

- We maintain a global corruptBlocksMap which has a mapping from 
Block->TreeSet<DatanodeDescriptor> holding set of datanodes which hold a 
corrupt replica of a block 
- Block reporting is handled in the same way as done now. On trunk if there are 
one/few corrupt replicas we deleted them thus reducing the replication. This 
patch also does the same, it reduces the replication, but instead of deleting 
it stores the replica in the global corruptBlocksMap.
- Once we detect all the replicas are corrupt, we set a flag indicating the 
whole block is corrupt whenever getBlockLocations is called. This is the same 
behavior on trunk, if we have just one copy of corrupt replica, we return it to 
the client and let the client handle it. Thus we have retained all the corrupt 
replicas and also provide a way to identify them.
- Added a new flag in LocatedBlock to indicate if this is a corrupt copy or not.
- Added a test case. Which first corrupts one replica and expects the block to 
be good and also expect the corrupt replica to be filtered. Later, we corrupt 
all replicas and expect the block returned to be of type corrupt block, but 
return all corrupt replicas and let the client deal with corrupt copies

      was (Author: lohit):
    This patch is based on approach discussed above.

- We maintain a global corruptBlocksMap which has a mapping from 
Block->TreeSet<DatanodeDescriptor> holding set of datanodes which hold a 
corrupt replica of a block 
- Block reporting is handled in the same way as done now. On trunk if there are 
one/few corrupt replicas we deleted them thus reducing the replication. This 
patch also does the same, it reduces the replication, but instead of deleting 
it stores the replica in the global corruptBlocksMap.
- Once we detect all the replicas are corrupt, we set a flag indicating the 
whole block is corrupt whenever getBlockLocations is called. This is the same 
behavior on trunk, if we have just one copy of corrupt replica, we return it to 
the client and let the client handle it. Thus we have retained all the corrupt 
replicas and also provide a way to identify them.
- Added a new flag in LocatedBlock to indicate if this is a corrupt copy or not.
- Added a test case. Which first corrupts one replica and expects the block to 
be good and also expect the block to be filtered. Later, we corrupt all 
replicas and expect the block returned to be of type corruptBlock.
  
> Replication policy for corrupted block 
> ---------------------------------------
>
>                 Key: HADOOP-2065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2065
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Koji Noguchi
>            Assignee: lohit vijayarenu
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-2065-2.patch, HADOOP-2065.patch
>
>
> Thanks to HADOOP-1955, even if one of the replica is corrupted, the block 
> should get replicated from a good replica relatively fast.
> Created this ticket to continue the discussion from 
> http://issues.apache.org/jira/browse/HADOOP-1955#action_12531162.
> bq. 2. Delete corrupted source replica
> bq. 3. If all replicas are corrupt, stop replication.
> For (2), it'll be nice if the namenode can delete the corrupted block if 
> there's a good replica on other nodes.
> For (3), I prefer if the namenode can still replicate the block.
> Before 0.14, if the file was corrupted, users were still able to pull the 
> data and decide if they want to delete those files. (HADOOP-2063)
> In 0.14 and later, we cannot/don't replicate these blocks so they eventually 
> get lost.
> To make the matters worse, if the corrupted file is accessed, all the 
> corrupted replicas would be deleted except for one and stay as replication 
> factor of 1 forever.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2065) Replication policy for corrupted block

Reply via email to