virajjasani opened a new pull request #3386:
URL: https://github.com/apache/hadoop/pull/3386


   ### Description of PR
   TestFsDatasetImpl#testDnRestartWithHardLink is flapper:
   ```
   [ERROR] 
testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
  Time elapsed: 7.768 s  <<< FAILURE!
   java.lang.AssertionError
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   ### How was this patch tested?
   Unit testing. The current flaky behaviour is easy to reproduce by running 
the test code twice as part of same test.
   The resolution is to disable the detection as well as deletion of duplicate 
finalized replica by BlockPoolSlice instance.
   
   When Datanode comes up, BPServiceActors handshakes to Namenode and tries to 
initialize Block pool and in the process, it tries to get VolumeMap using 
BlockPoolSlice instance. While doing so, reading replicas from cache fails and 
hence, the thread tries to add Finalized and RBW replicas to 
addReplicaThreadPool fork-join pool in order to build the map. This process 
also tries to identify if there exists any duplicate replica. For this 
particular test, sometimes this process can detect duplicate replica on /data2 
while processing finalized replica of /data1. Hence, before we can confirm 
newReplicaInfo.getBlockURI() exists, finalized replica on /data2 might get 
deleted (rare and flaky case). Although the probability for the thread 
processing the identification and deletion of duplicate finalized replica to be 
faster than main thread is less, it cannot be avoided. Hence, we disable adding 
Finalized and RBW replicas to addReplicaThreadPool in BlockPoolSlice here and 
re-enable it only af
 ter we confirm the existence of newReplicaInfo on "/data2" ARCHIVE storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to