[
https://issues.apache.org/jira/browse/HDFS-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321728#comment-14321728
]
Frode Halvorsen commented on HDFS-7787:
---------------------------------------
I now changed parameters again in order to speed up replicatuon, and now I see
that the decommissioning node is told to replicate both to two and three other
nodes. Actually most of the requests is to replicate only two copies, so I
suspect that the blocks it's asked to replicate does have live replicas in the
cluster.
Appx 1/5 of the replication requests is for three nodes:
2015-02-14 23:00:16,008 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839116_24099285 to datanode(s) x.x.x.206:50010
x.x.x.207:50010 x.x.x.209:50010
2015-02-14 23:00:16,009 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839119_24099288 to datanode(s) x.x.x.206:50010
x.x.x.205:50010 x.x.x.209:50010
2015-02-14 23:00:16,010 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839113_24099282 to datanode(s) x.x.x.204:50010 x.x.x.205:50010
2015-02-14 23:00:16,010 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839114_24099283 to datanode(s) x.x.x.204:50010 x.x.x.209:50010
2015-02-14 23:00:16,011 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839166_24099335 to datanode(s) x.x.x.204:50010 x.x.x.207:50010
2015-02-14 23:00:16,012 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839162_24099331 to datanode(s) x.x.x.206:50010
x.x.x.205:50010 x.x.x.209:50010
2015-02-14 23:00:20,046 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839309_24099478 to datanode(s) x.x.x.206:50010 x.x.x.205:50010
2015-02-14 23:00:20,047 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839310_24099479 to datanode(s) x.x.x.204:50010 x.x.x.205:50010
2015-02-14 23:00:20,047 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839352_24099521 to datanode(s) x.x.x.206:50010 x.x.x.209:50010
2015-02-14 23:00:20,048 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839359_24099528 to datanode(s) x.x.x.206:50010 x.x.x.209:50010
2015-02-14 23:00:20,048 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839358_24099527 to datanode(s) x.x.x.204:50010 x.x.x.209:50010
2015-02-14 23:00:20,049 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839357_24099526 to datanode(s) x.x.x.206:50010 x.x.x.207:50010
2015-02-14 23:00:22,056 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839241_24099410 to datanode(s) x.x.x.206:50010 x.x.x.205:50010
2015-02-14 23:00:22,057 INFO BlockStateChange: BLOCK* ask x.x.x.208:50010 to
replicate blk_1097839242_24099411 to datanode(s) x.x.x.204:50010
x.x.x.209:50010 x.x.x.205:50010
The node at 208 is decommissioning and i would say that this proves that the
node is asked to replicate blocks that have live replicas as well as blocks
with no live replicas. I haven't looked at the code, but it's wrong for me to
have the decommissioning node replicate other blocks than those without live
replicas.
> Split QUEUE_HIGHEST_PRIORITY in UnderReplicatedBlocks to give more priority
> to blocks on nodes being decomissioned
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7787
> URL: https://issues.apache.org/jira/browse/HDFS-7787
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.6.0
> Environment: 2 namenodes HA, 6 datanodes in two racks
> Reporter: Frode Halvorsen
> Labels: balance, hdfs, replication-performance
>
> Each file has a setting of 3 replicas. split on different racks.
> After a simulated crash of one rack (shutdown of all nodes, deleted
> data-directory an started nodes) and decommssion of one of the nodes in the
> orther rack the replication does not follow 'normal' rules...
> My cluster has appx 25 mill files, and the one node I now try to decommision
> has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live
> replicas'. After a restart of the node, it starts to replicate both types of
> blocks, but after a while, it only repliates under-replicated blocks with
> other live copies. I would think that the 'normal' way to do this would be to
> make sure that all blocks this node keeps the only copy of, should be the
> first to be replicated/balanced ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)