[
https://issues.apache.org/jira/browse/HDFS-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054815#comment-15054815
]
Mingliang Liu commented on HDFS-9493:
-------------------------------------
Hi [~twu],
Thanks for working on this. I think you analysis is correct that the datanode
(DN) is not marked as dead when saving metadata.
I like your idea to make the code robust by waiting for the DN to be removed.
Sleeping for 15 seconds is not able to guarantee the DN is already removed
(problem 1). Plus, 15 seconds idle time is too long for a unit test (problem
2). There is another bug in the code that stops DN in both {{testMetaSave()}}
and {{testMetasaveAfterDelete()}}. Both of the test assume there are two live
DNs in {{cluster}} and remove the second one before testing {{metaSave}}
(problem 3). The latter should fail if the DN is removed by the former test.
Your patch should have solved the first problem, but still, it needs wait 10~20
seconds on my local machine before the DN is removed. To solve the second
problem, we can expire a DN heartbeat on the NN in MiniDFSCluster via
{{setDataNodeDead()}}. As to the third problem, we can either build a cluster
for each test, or add the DN back after testing {{metaSave}}.
Added [~taoluo] and [~shv] to the watchers list as they worked on
{{testMetasaveAfterDelete()}} in [HDFS-4878].
> Test o.a.h.hdfs.server.namenode.TestMetaSave fails in trunk
> -----------------------------------------------------------
>
> Key: HDFS-9493
> URL: https://issues.apache.org/jira/browse/HDFS-9493
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Reporter: Mingliang Liu
> Assignee: Tony Wu
> Attachments: HDFS-9493.001.patch
>
>
> Tested in both Gentoo Linux and Mac.
> {quote}
> -------------------------------------------------------
> T E S T S
> -------------------------------------------------------
> Running org.apache.hadoop.hdfs.server.namenode.TestMetaSave
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 34.159 sec
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestMetaSave
> testMetasaveAfterDelete(org.apache.hadoop.hdfs.server.namenode.TestMetaSave)
> Time elapsed: 15.318 sec <<< FAILURE!
> java.lang.AssertionError: null
> at org.junit.Assert.fail(Assert.java:86)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at org.junit.Assert.assertTrue(Assert.java:52)
> at
> org.apache.hadoop.hdfs.server.namenode.TestMetaSave.testMetasaveAfterDelete(TestMetaSave.java:154)
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)