[ 
https://issues.apache.org/jira/browse/HDFS-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279740#comment-17279740
 ] 

Jim Brennan commented on HDFS-9243:
-----------------------------------

[~kihwal] analyzed this as well.  Including his comment here:
{noformat}
The test artificially invalidated a replica on a node, but before the test made 
further progress, the NN fixed the under-replication by having another node 
send the block to the same node. The test then went ahead and removed it from 
the NN's data structure (blocksmap) and called setReplication(). The NN picked 
two nodes, but one of them was the node that already has the block replica. It 
was only missing in NN's data structure. Again, this happened because the NN 
fixed the under-replication between the test deleting the replica and modifying 
the nn data structure. The replication failed with 
ReplicaAlreadyExistsException. This kind of inconsistency does not happen in 
real clusters, but even if it did, it would be fixed when the replication times 
out. The test is set to timeout before the default replication timeout, so it 
didn't have any chance to do that.
{noformat}

> TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-9243
>                 URL: https://issues.apache.org/jira/browse/HDFS-9243
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Wei-Chiu Chuang
>            Assignee: Hrishikesh Gadre
>            Priority: Minor
>
> org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks 
> sometimes time out.
> This is happening on trunk as can be observed in several recent jenkins job. 
> (e.g. https://builds.apache.org/job/Hadoop-Hdfs-trunk/2423/  
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/ 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2351/ 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/472/
> On my local Linux machine, this test case times out 6 out of 10 times. When 
> it does not time out, this test takes about 20 seconds, otherwise it takes 
> more than 60 seconds and then time out.
> I suspect it's a deadlock issue, as dead lock had occurred at this test case 
> in HDFS-5527 before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to