[ https://issues.apache.org/jira/browse/HADOOP-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-4071: ---------------------------------- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) The failed tests are not related to this issue. The change is too small to have a unit test. I've committed this! > FSNameSystem.isReplicationInProgress should add an underReplicated block to > the neededReplication queue using method "add" not "update" > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4071 > URL: https://issues.apache.org/jira/browse/HADOOP-4071 > Project: Hadoop Core > Issue Type: Bug > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.19.0 > > Attachments: decommission.patch > > > We have a datanode that did not get decommission done for days. It turned out > that there was an under replicated block that was never placed in the > neededReplication queue and therefore did not get replicated. The following > debug line showed the problem: > " DEBUG org.apache.hadoop.dfs.StateChange: UnderReplicationBlocks.update > blk_-7437651423871278837_0 curReplicas 8 > curExpectedReplicas 10 oldReplicas 9 oldExpectedReplicas 10 curPri 2 oldPri > 2" > The block was not in the neededReplication queue, but the update method > concluded that the block was under replicated and the priority level did not > change, so it did not add the block to the needReplication queue. > The solution is that in stead of using the update method, the name node > should use the add method to add the block to the neededReplication queue. > The add method guarantees success if the block is indeed under replicated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.