[ 
https://issues.apache.org/jira/browse/HADOOP-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591444#action_12591444
 ] 

lohit vijayarenu commented on HADOOP-2065:
------------------------------------------

After talking to dhruba, here are some more things which needs to be taken care 
of to complete this 

- When we detect that all replicas of a block are corrupt, we could replace the 
BlockInfo with another class called CorrupteBlockInfo which basically says that 
the block is corrupt and then remove the list of corrupt Block from all 
DatanodeDescriptors. This CorruptBlockInfo would have a flag which says if the 
block is corrupt/not and would provide api to get and set this flag. If in 
future we do encounter another good replica, then we should unset this flag. By 
default BlockInfo would return false for call. In this way we would not be 
wasting space in BlocksMap by having this flag for all Blocks.
- We should also provide a way to get the details of all corrupt blocks. This 
might require few API changes with possibly an additional flag. 
- Similar change should also go into getBlockLocations so that we should be 
able to get corrupt block information if such a flag is specified.

One way to implement this is
- Once we detect that all replicas of a block are corrupt and there are no live 
or good copy, we could do this synchronized operation of replacing the 
BlockInfo with CorruptBlockInfo. We should remove this block from BlocksMap and 
add new Block. This could be either called CorruptBlockInfo or 
ExtendedBlockInfo in which case later we could add not information to such an 
extended Block.
- In each addStoreBlock, we might have to check for type of block, if it is 
already a type of CorruptBlockInfo, we need to rest the flag and treat it as 
good copy and possibly add it to neededReplication queue
- getBlockLocations should be able to tell which replicas are corrupt, we might 
have to use the same CorruptBlockInfo/ExtendedBlockInfo for this

This is being done to get ride of the corruptList in DatanodeDescriptor. 
Anything we might have missed?

> Replication policy for corrupted block 
> ---------------------------------------
>
>                 Key: HADOOP-2065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2065
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Koji Noguchi
>            Assignee: lohit vijayarenu
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-2065.patch
>
>
> Thanks to HADOOP-1955, even if one of the replica is corrupted, the block 
> should get replicated from a good replica relatively fast.
> Created this ticket to continue the discussion from 
> http://issues.apache.org/jira/browse/HADOOP-1955#action_12531162.
> bq. 2. Delete corrupted source replica
> bq. 3. If all replicas are corrupt, stop replication.
> For (2), it'll be nice if the namenode can delete the corrupted block if 
> there's a good replica on other nodes.
> For (3), I prefer if the namenode can still replicate the block.
> Before 0.14, if the file was corrupted, users were still able to pull the 
> data and decide if they want to delete those files. (HADOOP-2063)
> In 0.14 and later, we cannot/don't replicate these blocks so they eventually 
> get lost.
> To make the matters worse, if the corrupted file is accessed, all the 
> corrupted replicas would be deleted except for one and stay as replication 
> factor of 1 forever.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to