[
https://issues.apache.org/jira/browse/HDFS-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045163#comment-16045163
]
Manoj Govindassamy commented on HDFS-11682:
-------------------------------------------
[~eddyxu],
>From your explanation it looks like it is inevitable to run {{runBalancer}} a
>couple of times with updated HB to trigger additional balancing if needed. +1
>(non-binding) with one comment below.
{noformat}
942 while (retry > 0) {
943 // start rebalancing
944 Collection<URI> namenodes = DFSUtil.getInternalNsRpcUris(conf);
945 final int run = runBalancer(namenodes, p, conf);
.. ..
955 waitForHeartBeat(totalUsedSpace, totalCapacity, client, cluster);
956 LOG.info(" .");
957 try {
958 waitForBalancer(totalUsedSpace, totalCapacity, client, cluster,
p,
959 excludedNodes);
960 } catch (TimeoutException e) {
961 // See HDFS-11682. NN may not get heartbeat to reflect the
newest
962 // block changes.
963 retry--;
964 if (retry == 0) {
965 throw e;
966 }
967 LOG.warn("The cluster has not balanced yet, retry...");
968 continue;
969 }
970 break;
971 }
{{waitForHeartBeat}} in the above loop can also timeout and throw
{{TimeoutException}} which is not caught like in {{waitForBalancer}}. So, the
caller could fail because of this.
> TestBalancer#testBalancerWithStripedFile is flaky
> -------------------------------------------------
>
> Key: HDFS-11682
> URL: https://issues.apache.org/jira/browse/HDFS-11682
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0-alpha4
> Reporter: Andrew Wang
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-11682.00.patch, HDFS-11682.01.patch,
> IndexOutOfBoundsException.log, timeout.log
>
>
> Saw this fail in two different ways on a precommit run, but pass locally.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]