virajjasani opened a new pull request #3386:
URL: https://github.com/apache/hadoop/pull/3386
### Description of PR
TestFsDatasetImpl#testDnRestartWithHardLink is flapper:
```
[ERROR]
testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
Time elapsed: 7.768 s <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
```
### How was this patch tested?
Unit testing. The current flaky behaviour is easy to reproduce by running
the test code twice as part of same test.
The resolution is to disable the detection as well as deletion of duplicate
finalized replica by BlockPoolSlice instance.
When Datanode comes up, BPServiceActors handshakes to Namenode and tries to
initialize Block pool and in the process, it tries to get VolumeMap using
BlockPoolSlice instance. While doing so, reading replicas from cache fails and
hence, the thread tries to add Finalized and RBW replicas to
addReplicaThreadPool fork-join pool in order to build the map. This process
also tries to identify if there exists any duplicate replica. For this
particular test, sometimes this process can detect duplicate replica on /data2
while processing finalized replica of /data1. Hence, before we can confirm
newReplicaInfo.getBlockURI() exists, finalized replica on /data2 might get
deleted (rare and flaky case). Although the probability for the thread
processing the identification and deletion of duplicate finalized replica to be
faster than main thread is less, it cannot be avoided. Hence, we disable adding
Finalized and RBW replicas to addReplicaThreadPool in BlockPoolSlice here and
re-enable it only af
ter we confirm the existence of newReplicaInfo on "/data2" ARCHIVE storage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]