[
https://issues.apache.org/jira/browse/HDFS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986818#comment-14986818
]
Walter Su commented on HDFS-6101:
---------------------------------
The test failed is possibly because the stopped DN doesn't be removed from
cluster map, and {{sleepSeconds(5)}} doesn't make sure it's removed from
cluster map.
1. Please don't remove this. It's intended. After sleeping, we want some writer
NOT yet started.
{code}
- // Some of them are too slow and will be not yet started.
- sleepSeconds(1);
{code}
2. Instead of hardcode sleep time 5s. We can use
{{GenericTestUtils.waitFor(..)}} to check the block replication. The
wait/notify is unnecessary.
3. After
{code}
cluster.stopDataNode(AppendTestUtil.nextInt(REPLICATION));
{code}
We should call cluster.setDataNodeDead(..) to remove it from cluster map.
> TestReplaceDatanodeOnFailure fails occasionally
> -----------------------------------------------
>
> Key: HDFS-6101
> URL: https://issues.apache.org/jira/browse/HDFS-6101
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Arpit Agarwal
> Assignee: Wei-Chiu Chuang
> Attachments: HDFS-6101.001.patch, HDFS-6101.002.patch,
> HDFS-6101.003.patch, TestReplaceDatanodeOnFailure.log
>
>
> Exception details in a comment below.
> The failure repros on both OS X and Linux if I run the test ~10 times in a
> loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)