[ 
https://issues.apache.org/jira/browse/HDFS-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547034#comment-13547034
 ] 

Junping Du commented on HDFS-4261:
----------------------------------

Hi Eli, with v7 patch, TestBalancerWithNodeGroup can always be successful on my 
local env, and I cannot reproduce ATM and Chris' issue (I tried 30+ times on my 
env already). I think at least 4 issues are identified and fixed for balancer 
here:
1. NoChangeIterations (for counting iteration of no block movement) is not 
working before. Comparing with branch-1, it seems to be involved by Namenode 
Federation.
2. balancer's Balancing policy is static so we need to cleanup (reset) in every 
iteration of balancing although we create a new balancer instance.
3. checkReplicaPlacementPolicy() issue which is identified by ATM.
4. the loop in dispatchBlocks() could be infinite in some occasional cases.
+1 on adding timeout annotation, I will add it in v8 patch.
                
> TestBalancerWithNodeGroup times out
> -----------------------------------
>
>                 Key: HDFS-4261
>                 URL: https://issues.apache.org/jira/browse/HDFS-4261
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 1.0.4, 1.1.1, 2.0.2-alpha
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Junping Du
>             Fix For: 3.0.0
>
>         Attachments: HDFS-4261.patch, HDFS-4261-v2.patch, HDFS-4261-v3.patch, 
> HDFS-4261-v4.patch, HDFS-4261-v5.patch, HDFS-4261-v6.patch, 
> HDFS-4261-v7.patch, jstack-mac-18567, jstack-win-5488, 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.mac,
>  
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup-output.txt.win
>
>
> When I manually ran TestBalancerWithNodeGroup, it always timed out in my 
> machine.  Looking at the Jerkins report [build 
> #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/],
>  TestBalancerWithNodeGroup somehow was skipped so that the problem was not 
> detected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to