Eric Badger created HDFS-10816:
----------------------------------
Summary: TestComputeInvalidateWork#testDatanodeReRegistration
fails due to race between test and replication monitor
Key: HDFS-10816
URL: https://issues.apache.org/jira/browse/HDFS-10816
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger
{noformat}
java.lang.AssertionError: Expected invalidate blocks to be the number of DNs
expected:<3> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at
org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
{noformat}
The test fails because of a race condition between the test and the replication
monitor. The default replication monitor interval is 3 seconds, which is just
about how long the test normally takes to run. The test deletes a file and then
subsequently gets the namesystem writelock. However, if the replication monitor
fires in between those two instructions, the test will fail as it will itself
invalidate one of the blocks. This can be easily reproduced by removing the
sleep() in the ReplicationMonitor's run() method in BlockManager.java, so that
the replication monitor executes as quickly as possible and exacerbates the
race.
To fix the test all that needs to be done is to turn off the replication
monitor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]