[
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450035#comment-15450035
]
Rushabh S Shah commented on HDFS-10816:
---------------------------------------
[~ebadger]: Thanks for reporting and analyzing the failure.
This test broke in our internal build recently.
Below are the relevant logs:
{noformat}
2016-08-29 01:54:49,332 INFO impl.RamDiskAsyncLazyPersistService
(RamDiskAsyncLazyPersistService.java:shutdown(169)) - All async lazy persist
service threads have been shut down
2016-08-29 01:54:49,336 INFO datanode.DataNode (DataNode.java:shutdown(1791))
- Shutdown complete.
2016-08-29 01:54:49,347 INFO BlockStateChange
(BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates:
blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637
2016-08-29 01:54:49,349 INFO FSNamesystem.audit
(FSNamesystem.java:logAuditMessage(8476)) - allowed=true ugi=tortuga
(auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/testRR dst=null
perm=null proto=rpc
2016-08-29 01:54:49,350 INFO BlockStateChange
(BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask
127.0.0.1:59637 to delete [blk_1073741825_1001]
2016-08-29 01:54:49,355 INFO hdfs.MiniDFSCluster
(MiniDFSCluster.java:shutdown(1725)) - Shutting down the Mini HDFS Cluster
{noformat}
bq. 2016-08-29 01:54:49,336 INFO datanode.DataNode
(DataNode.java:shutdown(1791)) - Shutdown complete.
This line corresponds to shutting down the last datanode.
bq. 2016-08-29 01:54:49,347 INFO BlockStateChange
(BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates:
blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637
After stopping the last datanode, I can see the InvalidateBlocks size is 3.
bq. 2016-08-29 01:54:49,350 INFO BlockStateChange
(BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask
127.0.0.1:59637 to delete \[blk_1073741825_1001\]
Then the replication monitor woke up and removed one block from the
invalidateBlocks set
I think the test was checking the invalidateBlock size just after the
replication monitor computed invalidate work for one node and that failed.
I think stopping the replication monitor is the correct fix.
[~jojochuang], [~zhz]: Since you reviewed HDFS-9580, can you please help
reviewing this patch.
> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race
> between test and replication monitor
> -----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Eric Badger
> Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs
> expected:<3> but was:<2>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the
> replication monitor. The default replication monitor interval is 3 seconds,
> which is just about how long the test normally takes to run. The test deletes
> a file and then subsequently gets the namesystem writelock. However, if the
> replication monitor fires in between those two instructions, the test will
> fail as it will itself invalidate one of the blocks. This can be easily
> reproduced by removing the sleep() in the ReplicationMonitor's run() method
> in BlockManager.java, so that the replication monitor executes as quickly as
> possible and exacerbates the race.
> To fix the test all that needs to be done is to turn off the replication
> monitor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]