[ 
https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535597#comment-13535597
 ] 

Junping Du commented on HDFS-4261:
----------------------------------

Hi ATM, Thanks for your input. I run several rounds test (>20) on my local env 
but haven't seen this error before. 
In general, this error happens when the cluster is not balanced after run 
balancer. We expected this happen in testBalancerEndInNoMoveProgress() but it 
shouldn't happen in TestBalancerWithNodeGroup.testBalancerWithNodeGroup() case. 
It is possible to be related to my latest changes as it jump out of thread of 
SourceBalancerNode if no blocks can be moved to target node (it is possible in 
this boundary test case) to get rid of infinite loop. It is possible to cause 
some balancerNode to end in unbalanced situation, but should get balanced in 
next balancing iteration (except it always get the same target node).
I need to do more investigation on it. 
                
> TestBalancerWithNodeGroup times out
> -----------------------------------
>
>                 Key: HDFS-4261
>                 URL: https://issues.apache.org/jira/browse/HDFS-4261
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Junping Du
>             Fix For: 3.0.0
>
>         Attachments: HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, 
> HDFS-4261-v4.patch, HDFS-4261-v5.patch, HDFS-4261-v6.patch, 
> HDFS-4261-v7.patch, jstack-mac-18567, jstack-win-5488, 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac,
>  
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win
>
>
> When I manually ran TestBalancerWithNodeGroup, it always timed out in my 
> machine.  Looking at the Jerkins report [build 
> #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/],
>  TestBalancerWithNodeGroup somehow was skipped so that the problem was not 
> detected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to