[
https://issues.apache.org/jira/browse/HDFS-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528082#comment-15528082
]
Yiqun Lin commented on HDFS-10426:
----------------------------------
Thanks [~liuml07] for pointing this. It seem that {{TestPending
InvalidateBlock#testPendingDeletion}} still failed sometimes(In HDFS-10915, it
also appeared). It seems that blockManager still schedules the invalidate
blocks even though we have already made the method {{getInvalidationDelay}}
return 1 indicates that we don't want to delete blocks right now. I'm not sure
if there is some race here. Can we delay the deletion operation, and skip the
current loop in ReplicationMonitor. In the next loop, I think the mockito
method will make sense. Ping [~iwasakims] for the comments.
> TestPendingInvalidateBlock failed in trunk
> ------------------------------------------
>
> Key: HDFS-10426
> URL: https://issues.apache.org/jira/browse/HDFS-10426
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Reporter: Yiqun Lin
> Assignee: Yiqun Lin
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10426.001.patch, HDFS-10426.002.patch,
> HDFS-10426.003.patch, HDFS-10426.004.patch, HDFS-10426.005.patch,
> HDFS-10426.006.patch
>
>
> The test {{TestPendingInvalidateBlock}} failed sometimes. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
> testPendingDeletion(org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock)
> Time elapsed: 7.703 sec <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock.testPendingDeletion(TestPendingInvalidateBlock.java:92)
> {code}
> It looks that the {{invalidateBlock}} has been removed before we do the check
> {code}
> // restart NN
> cluster.restartNameNode(true);
> dfs.delete(foo, true);
> Assert.assertEquals(0, cluster.getNamesystem().getBlocksTotal());
> Assert.assertEquals(REPLICATION, cluster.getNamesystem()
> .getPendingDeletionBlocks());
> Assert.assertEquals(REPLICATION,
> dfs.getPendingDeletionBlocksCount());
> {code}
> And I look into the related configurations. I found the property
> {{dfs.namenode.replication.interval}} was just set as 1 second in this test.
> And after the delay time of {{dfs.namenode.startup.delay.block.deletion.sec}}
> and the delete operation was slowly, it will cause this case. We can see the
> stack info before, the failed test costs 7.7s more than 5+1 second.
> One way can improve this.
> * Increase the time of {{dfs.namenode.replication.interval}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]