[
https://issues.apache.org/jira/browse/HDFS-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090970#comment-17090970
]
Ayush Saxena commented on HDFS-15297:
-------------------------------------
Thanx [~liuml07] for the report, Gave a check to the trace & Log,
The Reason of failure seems to be :
Both blocks gets deleted successfully from the disk, But the BlockReport is
sent from the Data in memory, the InMemory Data is corrected by
{{DirectoryScanner}} The Directory Scanner thread runs in parallel to the test,
hence only one block could get Marked deleted the other block got still
reported.
>From the logs in the shared link :
{noformat}
2020-04-23 03:16:48,998 [Time-limited test] DEBUG datanode.BlockReportTestBase
(BlockReportTestBase.java:blockReport_02(310)) - Removing the block
blk_1073741826 --> First Block
2020-04-23 03:16:49,002 [Time-limited test] DEBUG datanode.BlockReportTestBase
(BlockReportTestBase.java:blockReport_02(310)) - Removing the block
blk_1073741830 --> Second Block
2020-04-23 03:16:52,009 [Block report processor] DEBUG
blockmanagement.BlockManager (BlockManager.java:reportDiffSorted(3078)) -
Reported block blk_1073741830_1006 on 127.0.0.1:32993 size 1536 replicaState =
FINALIZED --> Second Block is the BR
{noformat}
A simple way to reproduce this could be to add some delay at
{{FSDataSetImp.java#checkAndUpdate()}} at L2491, a sleep of a second or two,
reproduces the same trace as you posted.
Probable Solutions :
* You can add {{DataNodeTestUtils.runDirectoryScanner(dn0);}} before
{{waitTil(TimeUnit.SECONDS.toMillis(DN_RESCAN_EXTRA_WAIT));}} at L322
* Increase the time in
{{waitTil(TimeUnit.SECONDS.toMillis(DN_RESCAN_EXTRA_WAIT));}}
* You can instead of the explicit wait use {{GenericTestUtils.waitFor(..) to
wait for the specific condition you need to figure out what to wait for, The
last thing that can be checked is the log it logs Removed.... in FsDataSetImpl
Give a check, if the reason seems correct to you, Option 1 I tried and it was
working for me. You may update with whichever way you like, or If anything
better.
> TestNNHandlesBlockReportPerStorage::blockReport_02 fails intermittently in
> trunk
> --------------------------------------------------------------------------------
>
> Key: HDFS-15297
> URL: https://issues.apache.org/jira/browse/HDFS-15297
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 3.4.0
> Reporter: Mingliang Liu
> Priority: Major
>
> It fails intermittently on {{trunk}} branch. Not sure about other branches.
> Example builds are:
> -
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1964/4/testReport/org.apache.hadoop.hdfs.server.datanode/TestNNHandlesBlockReportPerStorage/blockReport_02/
> - <To add more>
> Sample exception stack:
> {quote}
> java.lang.AssertionError: Wrong number of MissingBlocks is found expected:<2>
> but was:<1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockReportTestBase.blockReport_02(BlockReportTestBase.java:336)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]