[ 
https://issues.apache.org/jira/browse/HDFS-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090970#comment-17090970
 ] 

Ayush Saxena commented on HDFS-15297:
-------------------------------------

Thanx [~liuml07] for the report, Gave a check to the trace & Log, 
 The Reason of failure seems to be :
 Both blocks gets deleted successfully from the disk, But the BlockReport is 
sent from the Data in memory, the InMemory Data is corrected by 
{{DirectoryScanner}} The Directory Scanner thread runs in parallel to the test, 
hence only one block could get Marked deleted the other block got still 
reported.

>From the logs in the shared link :
{noformat}
2020-04-23 03:16:48,998 [Time-limited test] DEBUG datanode.BlockReportTestBase 
(BlockReportTestBase.java:blockReport_02(310)) - Removing the block 
blk_1073741826   --> First Block

2020-04-23 03:16:49,002 [Time-limited test] DEBUG datanode.BlockReportTestBase 
(BlockReportTestBase.java:blockReport_02(310)) - Removing the block 
blk_1073741830  --> Second Block

2020-04-23 03:16:52,009 [Block report processor] DEBUG 
blockmanagement.BlockManager (BlockManager.java:reportDiffSorted(3078)) - 
Reported block blk_1073741830_1006 on 127.0.0.1:32993 size 1536 replicaState = 
FINALIZED --> Second Block is the BR


{noformat}

A simple way to reproduce this could be to add some delay at 
{{FSDataSetImp.java#checkAndUpdate()}} at L2491, a sleep of a second or two, 
reproduces the same trace as you posted.

Probable Solutions :
* You can add {{DataNodeTestUtils.runDirectoryScanner(dn0);}} before 
{{waitTil(TimeUnit.SECONDS.toMillis(DN_RESCAN_EXTRA_WAIT));}} at L322
* Increase the time in 
{{waitTil(TimeUnit.SECONDS.toMillis(DN_RESCAN_EXTRA_WAIT));}}
* You can instead of the explicit wait use {{GenericTestUtils.waitFor(..) to 
wait for the specific condition you need to figure out what to wait for, The 
last thing that can be checked is the log it logs Removed.... in FsDataSetImpl

Give a check, if the reason seems correct to you, Option 1 I tried and it was 
working for me. You may update with whichever way you like, or If anything 
better.


> TestNNHandlesBlockReportPerStorage::blockReport_02 fails intermittently in 
> trunk
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-15297
>                 URL: https://issues.apache.org/jira/browse/HDFS-15297
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.4.0
>            Reporter: Mingliang Liu
>            Priority: Major
>
> It fails intermittently on {{trunk}} branch. Not sure about other branches. 
> Example builds are:
> - 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1964/4/testReport/org.apache.hadoop.hdfs.server.datanode/TestNNHandlesBlockReportPerStorage/blockReport_02/
> - <To add more>
> Sample exception stack:
> {quote}
> java.lang.AssertionError: Wrong number of MissingBlocks is found expected:<2> 
> but was:<1>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockReportTestBase.blockReport_02(BlockReportTestBase.java:336)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to