[
https://issues.apache.org/jira/browse/HDFS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018705#comment-13018705
]
Matt Foley commented on HDFS-1828:
----------------------------------
At line 79, it waits
while ( (numRacks < 2) || (curReplicas < REPLICATION_FACTOR) ||
(neededReplicationSize > 0))
But at line 95 it asserts
assertTrue(curReplicas == REPLICATION_FACTOR)
I believe that under the circumstances of the test, curReplicas will in fact be
REPLICATION_FACTOR + 1, transiently. Changed the "while" to wait for the
desired equality, i.e.,
while (curReplicas != REPLICATION_FACTOR).
Also changed the wait from infinite to a 20sec timeout with useful status
output on failure.
Similar code in the other test case under TestBlocksWithNotEnoughRacks does not
have the same problem, but still changed the wait from infinite to bounded.
Finally, added additional log messages useful for debugging if future problems.
> TestBlocksWithNotEnoughRacks intermittently fails assert
> --------------------------------------------------------
>
> Key: HDFS-1828
> URL: https://issues.apache.org/jira/browse/HDFS-1828
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.23.0
> Reporter: Matt Foley
> Assignee: Matt Foley
> Fix For: 0.23.0
>
>
> In
> server.namenode.TestBlocksWithNotEnoughRacks.testSufficientlyReplicatedBlocksWithNotEnoughRacks
>
> assert fails at curReplicas == REPLICATION_FACTOR, but it seems that it
> should go higher initially, and if the test doesn't wait for it to go back
> down, it will fail false positive.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira