[ 
https://issues.apache.org/jira/browse/HDFS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018705#comment-13018705
 ] 

Matt Foley commented on HDFS-1828:
----------------------------------

At line 79, it waits
      while ( (numRacks < 2) || (curReplicas < REPLICATION_FACTOR) || 
(neededReplicationSize > 0))

But at line 95 it asserts
      assertTrue(curReplicas == REPLICATION_FACTOR)

I believe that under the circumstances of the test, curReplicas will in fact be 
REPLICATION_FACTOR + 1, transiently.  Changed the "while" to wait for the 
desired equality, i.e., 
      while (curReplicas != REPLICATION_FACTOR).

Also changed the wait from infinite to a 20sec timeout with useful status 
output on failure.

Similar code in the other test case under TestBlocksWithNotEnoughRacks does not 
have the same problem, but still changed the wait from infinite to bounded.

Finally, added additional log messages useful for debugging if future problems.

> TestBlocksWithNotEnoughRacks intermittently fails assert
> --------------------------------------------------------
>
>                 Key: HDFS-1828
>                 URL: https://issues.apache.org/jira/browse/HDFS-1828
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>             Fix For: 0.23.0
>
>
> In 
> server.namenode.TestBlocksWithNotEnoughRacks.testSufficientlyReplicatedBlocksWithNotEnoughRacks
>  
> assert fails at curReplicas == REPLICATION_FACTOR, but it seems that it 
> should go higher initially, and if the test doesn't wait for it to go back 
> down, it will fail false positive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to