[
https://issues.apache.org/jira/browse/HDFS-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025306#comment-16025306
]
Lei (Eddy) Xu commented on HDFS-11682:
--------------------------------------
[~manojg], I should make it clearer, the problem is that there might not HB/BR
between two _iterations_ in {{Balancer#runOneIteration}}. So that before we run
{{waitForBalancer}}, the balancer task may have already finished, as the
Balancer thought the avg utilization is {{40}}. As mentioned in the previous
comments, the actual average utilization is {{38.7}}, so that the utilization
of that particular DN {{50}} is just about larger than {{avgUtilization +
threshold + delta (38.7 + 10 + 1)}} . The fix here is that, making sure that
{{Balancer}} can run after another HB/BR with an accurate utilization number.
[~andrew.wang], the miscalculation is due to the out-of-date NN blocks between
_two iterations_ of {{Balancer#runOneIteration}}. So it is hard to trigger
HB/BR with in the {{Balancer}} itself. Alternatively, how about make
{{timeout}} in {{TestBalancer#waitForBalancer}} shorter, i.e., 10s? So the
execution time in the worst case is about the same.
> TestBalancer#testBalancerWithStripedFile is flaky
> -------------------------------------------------
>
> Key: HDFS-11682
> URL: https://issues.apache.org/jira/browse/HDFS-11682
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0-alpha3
> Reporter: Andrew Wang
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-11682.00.patch, HDFS-11682.01.patch,
> IndexOutOfBoundsException.log, timeout.log
>
>
> Saw this fail in two different ways on a precommit run, but pass locally.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]