[ https://issues.apache.org/jira/browse/HADOOP-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502544 ]
Hairong Kuang commented on HADOOP-1300: --------------------------------------- First of all I think removing 30 replicas is not a common case. Even there is such a case, my algorithm should not be more CPU heavier than the current algorithm. The only overhead is to split all replicas into two sets, but this is done once for all excessive replicas. When selecting a replica to remove, my algorithm scans only one of the two sets while the current algorithm scans all replicas. So in average my algorithm should cut the scanning cost by half. Alternatively I can build a heap to support replica removal when the number of excessive replicas is big. But since it is not a common case, I am not sure if it is worth doing. Please let me know what you think. > deletion of excess replicas does not take into account 'rack-locality' > ---------------------------------------------------------------------- > > Key: HADOOP-1300 > URL: https://issues.apache.org/jira/browse/HADOOP-1300 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Koji Noguchi > Assignee: Hairong Kuang > Attachments: excessDel.patch > > > One rack went down today, resulting in one missing block/file. > Looking at the log, this block was originally over-replicated. > 3 replicas on one rack and 1 replica on another. > Namenode decided to delete the latter, leaving 3 replicas on the same rack. > It'll be nice if the deletion is also rack-aware. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.