[
https://issues.apache.org/jira/browse/HDFS-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279740#comment-17279740
]
Jim Brennan commented on HDFS-9243:
-----------------------------------
[~kihwal] analyzed this as well. Including his comment here:
{noformat}
The test artificially invalidated a replica on a node, but before the test made
further progress, the NN fixed the under-replication by having another node
send the block to the same node. The test then went ahead and removed it from
the NN's data structure (blocksmap) and called setReplication(). The NN picked
two nodes, but one of them was the node that already has the block replica. It
was only missing in NN's data structure. Again, this happened because the NN
fixed the under-replication between the test deleting the replica and modifying
the nn data structure. The replication failed with
ReplicaAlreadyExistsException. This kind of inconsistency does not happen in
real clusters, but even if it did, it would be fixed when the replication times
out. The test is set to timeout before the default replication timeout, so it
didn't have any chance to do that.
{noformat}
> TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout
> -----------------------------------------------------------------------------
>
> Key: HDFS-9243
> URL: https://issues.apache.org/jira/browse/HDFS-9243
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: test
> Reporter: Wei-Chiu Chuang
> Assignee: Hrishikesh Gadre
> Priority: Minor
>
> org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
> sometimes time out.
> This is happening on trunk as can be observed in several recent jenkins job.
> (e.g. https://builds.apache.org/job/Hadoop-Hdfs-trunk/2423/
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2351/
> https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/472/
> On my local Linux machine, this test case times out 6 out of 10 times. When
> it does not time out, this test takes about 20 seconds, otherwise it takes
> more than 60 seconds and then time out.
> I suspect it's a deadlock issue, as dead lock had occurred at this test case
> in HDFS-5527 before.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]