[ 
https://issues.apache.org/jira/browse/HDFS-11446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027021#comment-16027021
 ] 

Manoj Govindassamy commented on HDFS-11446:
-------------------------------------------

[~linyiqun],
  Thanks for reporting this issue and providing a patch. My understanding on 
PendingBlock is that NameNode has already started the reconstruction for that 
block and just waiting for the replication completion and block reports to 
remove the block from the pending list. Did you get a chance to look at the 
failed log to see if there are any failed reconstruction tasks or other issues? 
Anyways, increasing the wait time for reconstruction from 20seconds to 1min 
should be ok if it can avoid the test flakiness. +1 (non-binding). 

[~mingma], do you have any other comments on the latest patch?

> TestMaintenanceState#testWithNNAndDNRestart fails intermittently
> ----------------------------------------------------------------
>
>                 Key: HDFS-11446
>                 URL: https://issues.apache.org/jira/browse/HDFS-11446
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-11446.001.patch, HDFS-11446.002.patch, 
> HDFS-11446.003.patch, HDFS-11446.004.patch, HDFS-11446-branch-2.002.patch, 
> HDFS-11446-branch-2.patch
>
>
> The test {{TestMaintenanceState#testWithNNAndDNRestart}} fails in trunk. The 
> stack info( 
> https://builds.apache.org/job/PreCommit-HDFS-Build/18423/testReport/ ):
> {code}
> java.lang.AssertionError: expected null, but was:<Wrong number of replicas 
> for block BP-1367163238-172.17.0.2-1487836532907:blk_1073741825_1001: 
> expected 3, got 2 
> ,DatanodeInfoWithStorage[127.0.0.1:42649,DS-c499e6ef-ce14-428b-baef-8cf2a122b248,DISK],DatanodeInfoWithStorage[127.0.0.1:40774,DS-cc484c09-6e32-4804-a337-2871f37b62e1,DISK],pending
>  block # 1 ,under replicated # 0 ,>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotNull(Assert.java:664)
>       at org.junit.Assert.assertNull(Assert.java:646)
>       at org.junit.Assert.assertNull(Assert.java:656)
>       at 
> org.apache.hadoop.hdfs.TestMaintenanceState.testWithNNAndDNRestart(TestMaintenanceState.java:731)
> {code}
> The failure seems due to pending block has not been replicated. We can bump 
> the retry times since sometimes the cluster would be busy. Also we can use 
> {{GenericTestUtils#waitFor}} to simplified the current compared logic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to