[ 
https://issues.apache.org/jira/browse/HADOOP-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502544
 ] 

Hairong Kuang commented on HADOOP-1300:
---------------------------------------

First of all I think removing 30 replicas is not a common case. Even there is 
such a case, my algorithm should not be more CPU heavier than the current 
algorithm. The only overhead is to split all replicas into two sets, but this 
is done once for all excessive replicas. When selecting a replica to remove, my 
algorithm scans only one of the two sets while the current algorithm scans all 
replicas. So in average my algorithm should cut the scanning cost by half. 

Alternatively I can build a heap to support replica removal when the number of 
excessive replicas is big. But since it is not a common case, I am not sure if 
it is worth doing. Please let me know what you think.

> deletion of excess replicas does not take into account 'rack-locality'
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-1300
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1300
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>         Attachments: excessDel.patch
>
>
> One rack went down today, resulting in one missing block/file.
> Looking at the log, this block was originally over-replicated. 
> 3 replicas on one rack and 1 replica on another.
> Namenode decided to delete the latter, leaving 3 replicas on the same rack.
> It'll be nice if the deletion is also rack-aware.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to