[ 
https://issues.apache.org/jira/browse/HDFS-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-5983:
--------------------------

    Attachment: HDFS-5983.patch

Kihwal, Chen, I started working on this yesterday and just found out it has 
been assigned to Chen. Here is the patch. There are basically some timing 
conditions that can make the test invalid:

1. Based on MiniDFSCluster setting, DN will send two blockReports, one for each 
storage. If the second blockReport came in after 
NameNodeAdapter.getSafeModeSafeBlocks and before 
BlockManagerTestUtil.updateState, the test will fail.
2. processMisReplicatedBlocks is async. To make sure it completes before the 
test gets the metrics, the test can wait until it completes.

Sorry, hope Chen hasn't spent much time on this. While the root causes have 
been identified, there are other ways to fix the test. Feel free to continue 
the work if necessary.

Thanks.

> TestSafeMode#testInitializeReplQueuesEarly fails
> ------------------------------------------------
>
>                 Key: HDFS-5983
>                 URL: https://issues.apache.org/jira/browse/HDFS-5983
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Chen He
>         Attachments: HDFS-5983.patch, testlog.txt
>
>
> It was seen from one of the precommit build of HDFS-5962.  The test case 
> creates 15 blocks and then shuts down all datanodes. Then the namenode is 
> restarted with a low safe block threshold and one datanode is restarted. The 
> idea is that the initial block report from the restarted datanode will make 
> the namenode leave the safemode and initialize the replication queues.
> According to the log, the datanode reported 3 blocks, but slightly before 
> that the namenode did repl queue init with 1 block.  I will attach the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to