[
https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535597#comment-13535597
]
Junping Du commented on HDFS-4261:
----------------------------------
Hi ATM, Thanks for your input. I run several rounds test (>20) on my local env
but haven't seen this error before.
In general, this error happens when the cluster is not balanced after run
balancer. We expected this happen in testBalancerEndInNoMoveProgress() but it
shouldn't happen in TestBalancerWithNodeGroup.testBalancerWithNodeGroup() case.
It is possible to be related to my latest changes as it jump out of thread of
SourceBalancerNode if no blocks can be moved to target node (it is possible in
this boundary test case) to get rid of infinite loop. It is possible to cause
some balancerNode to end in unbalanced situation, but should get balanced in
next balancing iteration (except it always get the same target node).
I need to do more investigation on it.
> TestBalancerWithNodeGroup times out
> -----------------------------------
>
> Key: HDFS-4261
> URL: https://issues.apache.org/jira/browse/HDFS-4261
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer
> Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: Junping Du
> Fix For: 3.0.0
>
> Attachments: HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch,
> HDFS-4261-v4.patch, HDFS-4261-v5.patch, HDFS-4261-v6.patch,
> HDFS-4261-v7.patch, jstack-mac-18567, jstack-win-5488,
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac,
>
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win
>
>
> When I manually ran TestBalancerWithNodeGroup, it always timed out in my
> machine. Looking at the Jerkins report [build
> #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/],
> TestBalancerWithNodeGroup somehow was skipped so that the problem was not
> detected.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira