[ 
https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274330#comment-14274330
 ] 

Ming Ma commented on HDFS-6681:
-------------------------------

Thanks, Ratandeep! I agree with your detailed analysis. For your description 
"One scenario in which this loop will never break is when the Namenode tries to 
schedule a new replica on the same node on which we actually corrupted the 
block.", {{BlockManager}}'s {{markBlockAsCorrupt}} function will ask DN to 
invalidate the corrupt block after NN receives block report from that DN. So if 
the replication is scheduled after that, then the loop should break. We can fix 
the test code to make sure the correct order of events happen. But that is more 
complicated.

For the first loop, does it just reduce the chance? If the test runs on slow 
machine, maybe 3 is still not enough. We can delay the start of 3rd DN by 
moving  {{cluster.startDataNodes(conf, 1, true, null, null, null);}} after the 
loop check; but that doesn't prevent the the replication to finish quickly on 
the same DN with old corrupted block.

> TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is 
> flaky and sometimes gets stuck in infinite loops
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6681
>                 URL: https://issues.apache.org/jira/browse/HDFS-6681
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.1
>         Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
> Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Ratandeep Ratti
>            Assignee: Ratandeep Ratti
>         Attachments: HDFS-6681.patch
>
>
> This testcase has 3 infinite loops which break only on certain conditions 
> being satisfied.
> 1st loop checks if there should be a single live replica. It assumes this to 
> be true since it has just corrupted a block on one of the datanodes (testcase 
> has replication factor as 2). One scenario in which this loop will never 
> break is if the Namenode invalidates the corrupt replica, schedules a 
> replication command, and the new copied replica is added all before this 
> testcase has the chance to check the live-replica count.
> 2nd loop checks there should be 2 live replicas. It assumes this to be true 
> (in some time) since the first loop has broken implying there is a single 
> replica and now it is only a matter of time when the Namenode schedules a 
> replication command to copy a replica to another datanode. One scenario in 
> which this loop will never break is when the Namenode tries to schedule a new 
> replica on the same node on which we actually corrupted the block. That dst. 
> datanode will not copy the block, complaining that it already has the 
> (corrupted) replica in the create state. The situation that results is that 
> Namenode has scheduled a copy to a datanode, the block is now in the 
> namenode's pending replication queue, this block will never be removed from 
> the pending replication queue because the namenode will never receive a 
> report from the datanodes that the block is 'added'.
> Note: The block can be transferred from the 'pending replication' to "needed 
> replication" queue once the pending timeout (5 minutes) expires. The Namenode 
> then actively tries to schedule a replication for blocks in 'needed 
> replication' queue. This can cause the 2nd loop to break but the time in 
> which this process gets kicked in is more than 5 minutes.
> 3rd loop: This loops checks if there are no corrupt replicas. I don't see a 
> scenario in which this loop can go on for ever, since once the live replica 
> count goes back to normal (2), the corrupted block will be removed
> I guess increasing the heart beat interval time, so that the testcase has 
> enough time to check condition in loop 1 before a datanode reports a 
> successful copy should help avoid race condition in loop1. Regarding loop2 I 
> guess we can reduce the timeout after which the block is transferred from the 
> pending replication to the needed replication queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to