[
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834600#comment-13834600
]
Vinay commented on HDFS-5579:
-----------------------------
Thanks [~zhaoyunjiong] for filing Jira.
I think your fix would work and decommission datanode quickly.
Here is little comments about your patch.
1. try-catch not required. no statement inside try block will throw exception.
2. {code}block.getBlockId() == bc.getLastBlock().getBlockId(){code}
Better to use block.equals(bc.getLastBlock())
3. {code}if (block.getBlockId() == bc.getLastBlock().getBlockId() &&
curReplicas > 1) {
+ continue;
+ }{code}
Instead of 1 use minReplication
4. {code}+ underReplicatedInOpenFiles++;{code}
This should be incremented only if enough replicas are not there.
> Under construction files make DataNode decommission take very long hours
> ------------------------------------------------------------------------
>
> Key: HDFS-5579
> URL: https://issues.apache.org/jira/browse/HDFS-5579
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 1.2.0, 2.2.0
> Reporter: zhaoyunjiong
> Assignee: zhaoyunjiong
> Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch
>
>
> We noticed that some times decommission DataNodes takes very long time, even
> exceeds 100 hours.
> After check the code, I found that in
> BlockManager:computeReplicationWorkForBlocks(List<List<Block>>
> blocksToReplicate) it won't replicate blocks which belongs to under
> construction files, however in
> BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there
> is block need replicate no matter whether it belongs to under construction or
> not, the decommission progress will continue running.
> That's the reason some time the decommission takes very long time.
--
This message was sent by Atlassian JIRA
(v6.1#6144)