[
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021553#comment-14021553
]
Binglin Chang commented on HDFS-6159:
-------------------------------------
The test error log:
Rebalancing expected avg utilization to become 0.16, but on datanode
127.0.0.1:55468 it remains at 0.02 after more than 40000 msec.
bug from the balancer log:
{noformat}
2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) -
Successfully moved blk_1073741829_1005 with size=100 from 127.0.0.1:49159 to
127.0.0.1:55468 through 127.0.0.1:49159
We can see than there are 8 blocks(800 bytes) moved to 127.0.0.1:55468, the
utilization of this datanode should be 0.16(800/5000).
{noformat}
But at the same time, those blocks are deleted by block manager:
{noformat}
2014-06-06 18:15:54,706 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to
invalidated blocks set
2014-06-06 18:15:54,709 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to
invalidated blocks set
2014-06-06 18:15:56,421 INFO BlockStateChange
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask
127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
2014-06-06 18:15:57,717 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to
invalidated blocks set
2014-06-06 18:15:57,720 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to
invalidated blocks set
2014-06-06 18:15:57,721 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to
invalidated blocks set
2014-06-06 18:15:57,722 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to
invalidated blocks set
2014-06-06 18:15:57,723 INFO BlockStateChange
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK*
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to
invalidated blocks set
2014-06-06 18:15:59,422 INFO BlockStateChange
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask
127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005,
blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
2014-06-06 18:16:02,423 INFO BlockStateChange
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask
127.0.0.1:55468 to delete [blk_1073741845_1021]
{noformat}
At last, only block blk_1073741828_1004 is left on 127.0.0.1:55468, so the
final utilizition is(100/5000=0.02)
Those blocks are newly moved by balancer and should not be invalidated by block
manager. Most likely some logic in BlockManager.java invalidate blocks is
broken? Look at the svn log, recently there are some change in BlockManager
invalidate block. (HDFS-6424, HDFS-6362).
Perhaps [~jingzhao] or [~arpitagarwal] can help look at this?
> TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block
> missing after balancer success
> ----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-6159
> URL: https://issues.apache.org/jira/browse/HDFS-6159
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 2.3.0
> Reporter: Chen He
> Assignee: Chen He
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch,
> logs.txt
>
>
> The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative
> false failure if there is(are) data block(s) losing after balancer
> successfuly finishes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)