[
https://issues.apache.org/jira/browse/HDFS-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709983#comment-13709983
]
Kihwal Lee commented on HDFS-4998:
----------------------------------
In the log from the build, the deletion happened 3 seconds after setReplication
was done. The log shows that triggerHeartbeat() didn't work, because it raced
with block report and lost. As a result lastHeartbeat was reset to the block
report time and heartbeat wasn't sent right away. The 3 second delay can be
explained by this.
It seems that increasing the existing sleep to 3 seconds + slack will prevent
this race.
> TestUnderReplicatedBlocks fails intermittently
> ----------------------------------------------
>
> Key: HDFS-4998
> URL: https://issues.apache.org/jira/browse/HDFS-4998
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 2.1.0-beta
> Reporter: Kihwal Lee
>
> Looking at branch-2.1-beta jenkins build, this test case seems flaky.
> First, addToInvalidates() is called against a block on a datanode. This
> removes the dn from the BlockInfo in blocksMap.
> At this point, raising the replication factor can cause the same node to be
> picked. If the node has already deleted the block, it will work. If not, the
> replication fails. When it fails, it will take at least the pending
> replication timeout to reschedule, which is 5 minutes. But the test will
> timeout before this and fail.
> We could make it wait for the completion of actual block deletion. Or the
> test timeout can be made a lot longer, but past jiras against this test case
> indicate people want this to run faster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira