[
https://issues.apache.org/jira/browse/HDFS-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15021383#comment-15021383
]
Xiao Chen commented on HDFS-9429:
---------------------------------
Attached patch to reproduce the failure to a same stack trace but with a
different type of exception. As mentioned above, EOFE needs to be very exact to
reproduce. I think this reproduce patch is sufficient to prove that a
{{waitActive}}-ish method is needed.
The reproduced failure is caused by JN rpc server starting later than the rpc
call inside the said stack trace. Un-commenting the
{{journalCluster.waitActive();}} in {{MiniQJMHACluster#MiniQJMHACluster}} at
line 101 will make the unit test pass, due to the introduced {{waitActive}}.
Below is a sample failure stack trace using the attached patch.
{noformat}
java.io.IOException: Timed out waiting for response from loggers
at
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:229)
at
org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:916)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:180)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1067)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:370)
at
org.apache.hadoop.hdfs.DFSTestUtil.formatNameNode(DFSTestUtil.java:228)
at
org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1005)
at
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
at
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:482)
at
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
at
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.<init>(MiniQJMHACluster.java:111)
at
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.<init>(MiniQJMHACluster.java:37)
at
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:65)
at
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:84)
at
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testMetaSave(TestDFSAdminWithHA.java:205)
{noformat}
Please kindly review patch 1. Thanks.
> Tests in TestDFSAdminWithHA intermittently fail with EOFException
> -----------------------------------------------------------------
>
> Key: HDFS-9429
> URL: https://issues.apache.org/jira/browse/HDFS-9429
> Project: Hadoop HDFS
> Issue Type: Test
> Components: HDFS
> Reporter: Xiao Chen
> Assignee: Xiao Chen
> Attachments: HDFS-9429.001.patch, HDFS-9429.reproduce
>
>
> I have seen this fail a handful of times for {{testMetaSave}}, but from my
> understanding this is from {{setUpHaCluster}} so theoretically it could fail
> for any cases in the class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)