[ 
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021383#comment-15021383
 ] 

Xiao Chen commented on HDFS-9429:
---------------------------------

Attached patch to reproduce the failure to a same stack trace but with a 
different type of exception. As mentioned above, EOFE needs to be very exact to 
reproduce. I think this reproduce patch is sufficient to prove that a 
{{waitActive}}-ish method is needed.

The reproduced failure is caused by JN rpc server starting later than the rpc 
call inside the said stack trace. Un-commenting the 
{{journalCluster.waitActive();}} in {{MiniQJMHACluster#MiniQJMHACluster}} at 
line 101 will make the unit test pass, due to the introduced {{waitActive}}.

Below is a sample failure stack trace using the attached patch.
{noformat}
java.io.IOException: Timed out waiting for response from loggers
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:229)
        at 
org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:916)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:180)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1067)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:370)
        at 
org.apache.hadoop.hdfs.DFSTestUtil.formatNameNode(DFSTestUtil.java:228)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1005)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
        at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:482)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
        at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.<init>(MiniQJMHACluster.java:111)
        at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.<init>(MiniQJMHACluster.java:37)
        at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:65)
        at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:84)
        at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testMetaSave(TestDFSAdminWithHA.java:205)
{noformat}

Please kindly review patch 1. Thanks.

> Tests in TestDFSAdminWithHA intermittently fail with EOFException
> -----------------------------------------------------------------
>
>                 Key: HDFS-9429
>                 URL: https://issues.apache.org/jira/browse/HDFS-9429
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: HDFS
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-9429.001.patch, HDFS-9429.reproduce
>
>
> I have seen this fail a handful of times for {{testMetaSave}}, but from my 
> understanding this is from {{setUpHaCluster}} so theoretically it could fail 
> for any cases in the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to