[
https://issues.apache.org/jira/browse/HADOOP-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592310#action_12592310
]
lohit edited comment on HADOOP-2065 at 4/25/08 2:16 AM:
-------------------------------------------------------------------
This patch is based on approach discussed above.
- We maintain a global corruptBlocksMap which has a mapping from
Block->TreeSet<DatanodeDescriptor> holding set of datanodes which hold a
corrupt replica of a block
- Block reporting is handled in the same way as done now. On trunk if there are
one/few corrupt replicas we deleted them thus reducing the replication. This
patch also does the same, it reduces the replication, but instead of deleting
it stores the replica in the global corruptBlocksMap.
- Once we detect all the replicas are corrupt, we set a flag indicating the
whole block is corrupt whenever getBlockLocations is called. This is the same
behavior on trunk, if we have just one copy of corrupt replica, we return it to
the client and let the client handle it. Thus we have retained all the corrupt
replicas and also provide a way to identify them.
- Added a new flag in LocatedBlock to indicate if this is a corrupt copy or not.
- Added a test case. Which first corrupts one replica and expects the block to
be good and also expect the corrupt replica to be filtered. Later, we corrupt
all replicas and expect the block returned to be of type corrupt block, but
return all corrupt replicas and let the client deal with corrupt copies
was (Author: lohit):
This patch is based on approach discussed above.
- We maintain a global corruptBlocksMap which has a mapping from
Block->TreeSet<DatanodeDescriptor> holding set of datanodes which hold a
corrupt replica of a block
- Block reporting is handled in the same way as done now. On trunk if there are
one/few corrupt replicas we deleted them thus reducing the replication. This
patch also does the same, it reduces the replication, but instead of deleting
it stores the replica in the global corruptBlocksMap.
- Once we detect all the replicas are corrupt, we set a flag indicating the
whole block is corrupt whenever getBlockLocations is called. This is the same
behavior on trunk, if we have just one copy of corrupt replica, we return it to
the client and let the client handle it. Thus we have retained all the corrupt
replicas and also provide a way to identify them.
- Added a new flag in LocatedBlock to indicate if this is a corrupt copy or not.
- Added a test case. Which first corrupts one replica and expects the block to
be good and also expect the block to be filtered. Later, we corrupt all
replicas and expect the block returned to be of type corruptBlock.
> Replication policy for corrupted block
> ---------------------------------------
>
> Key: HADOOP-2065
> URL: https://issues.apache.org/jira/browse/HADOOP-2065
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.14.1
> Reporter: Koji Noguchi
> Assignee: lohit vijayarenu
> Fix For: 0.18.0
>
> Attachments: HADOOP-2065-2.patch, HADOOP-2065.patch
>
>
> Thanks to HADOOP-1955, even if one of the replica is corrupted, the block
> should get replicated from a good replica relatively fast.
> Created this ticket to continue the discussion from
> http://issues.apache.org/jira/browse/HADOOP-1955#action_12531162.
> bq. 2. Delete corrupted source replica
> bq. 3. If all replicas are corrupt, stop replication.
> For (2), it'll be nice if the namenode can delete the corrupted block if
> there's a good replica on other nodes.
> For (3), I prefer if the namenode can still replicate the block.
> Before 0.14, if the file was corrupted, users were still able to pull the
> data and decide if they want to delete those files. (HADOOP-2063)
> In 0.14 and later, we cannot/don't replicate these blocks so they eventually
> get lost.
> To make the matters worse, if the corrupted file is accessed, all the
> corrupted replicas would be deleted except for one and stay as replication
> factor of 1 forever.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.