[ 
https://issues.apache.org/jira/browse/HDFS-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709983#comment-13709983
 ] 

Kihwal Lee commented on HDFS-4998:
----------------------------------

In the log from the build, the deletion happened 3 seconds after setReplication 
was done. The log shows that triggerHeartbeat() didn't work, because it raced 
with block report and lost. As a result lastHeartbeat was reset to the block 
report time and heartbeat wasn't sent right away. The 3 second delay can be 
explained by this.

It seems that increasing the existing sleep to 3 seconds + slack will prevent 
this race.
                
> TestUnderReplicatedBlocks fails intermittently
> ----------------------------------------------
>
>                 Key: HDFS-4998
>                 URL: https://issues.apache.org/jira/browse/HDFS-4998
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.1.0-beta
>            Reporter: Kihwal Lee
>
> Looking at branch-2.1-beta jenkins build, this test case seems flaky.
> First, addToInvalidates() is called against a block on a datanode. This 
> removes the dn from the BlockInfo in blocksMap. 
> At this point, raising the replication factor can cause the same node to be 
> picked. If the node has already deleted the block, it will work. If not, the 
> replication fails. When it fails, it will take at least the pending 
> replication timeout to reschedule, which is 5 minutes.  But the test will 
> timeout before this and fail. 
> We could make it wait for the completion of actual block deletion. Or the 
> test timeout can be made a lot longer, but past jiras against this test case 
> indicate people want this to run faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to