[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

Varun Sharma (JIRA) Wed, 24 Apr 2013 15:45:17 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641101#comment-13641101
 ]


Varun Sharma commented on HBASE-8389:
-------------------------------------

This is Hadoop 2.0.0 alpha CDH 4.2 - namenode logs - this all there is for this 
block
LOG LINE FOR BLOCK CREATION (NAMENODE)

2013-04-24 05:40:30,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: 
/hbase/.logs/ip-10-170-15-97.ec2.internal,60020,1366780717760/ip-10-170-15-97.ec2.internal%2C60020%2C1366780717760.1366782030238.
 BP-889095791-10.171.1.40-1366491606582 
blk_-2482251885029951704_11942{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.170.15.97:50010|RBW], 
ReplicaUnderConstruction[10.168.12.138:50010|RBW], 
ReplicaUnderConstruction[10.170.6.131:50010|RBW]]}

LOG LINES FOR RECOVERY INITIATION (NAMENODE)

2013-04-24 06:14:43,623 INFO BlockStateChange: BLOCK* 
blk_-2482251885029951704_11942{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, 
replicas=[ReplicaUnderConstruction[10.170.15.97:50010|RBW], 
ReplicaUnderConstruction[10.168.12.138:50010|RBW], 
ReplicaUnderConstruction[10.170.6.131:50010|RBW]]} recovery started, 
primary=10.170.15.97:50010
2013-04-24 06:14:43,623 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.internalReleaseLease: File 
/hbase/.logs/ip-10-170-15-97.ec2.internal,60020,1366780717760-splitting/ip-10-170-15-97.ec2.internal%2C60020%2C1366780717760.1366782030238
 has not been closed. Lease recovery is in progress. RecoveryId = 12012 for 
block blk_-2482251885029951704_11942{blockUCState=UNDER_RECOVERY, 
primaryNodeIndex=0, replicas=[ReplicaUnderConstruction[10.170.15.97:50010|RBW], 
ReplicaUnderConstruction[10.168.12.138:50010|RBW], 
ReplicaUnderConstruction[10.170.6.131:50010|RBW]]}

Note that the primary index is 0 - which is the datanode i killed. This was 
chosen as the primary DN for lease recovery. Obviously it will not work isnce 
the node is dead. But recoverLease returned true neverthless for the next call. 
Now I am not sure if that is expected behaviour since the real block recovery 
never happened.

                
> HBASE-8354 forces Namenode into loop with lease recovery requests
> -----------------------------------------------------------------
>
>                 Key: HBASE-8389
>                 URL: https://issues.apache.org/jira/browse/HBASE-8389
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Varun Sharma
>            Assignee: Varun Sharma
>            Priority: Critical
>             Fix For: 0.94.8
>
>         Attachments: 8389-0.94.txt, 8389-0.94-v2.txt, 8389-0.94-v3.txt, 
> 8389-0.94-v4.txt, 8389-0.94-v5.txt, 8389-0.94-v6.txt, 8389-trunk-v1.txt, 
> 8389-trunk-v2.patch, 8389-trunk-v2.txt, 8389-trunk-v3.txt, nn1.log, nn.log, 
> sample.patch
>
>
> We ran hbase 0.94.3 patched with 8354 and observed too many outstanding lease 
> recoveries because of the short retry interval of 1 second between lease 
> recoveries.
> The namenode gets into the following loop:
> 1) Receives lease recovery request and initiates recovery choosing a primary 
> datanode every second
> 2) A lease recovery is successful and the namenode tries to commit the block 
> under recovery as finalized - this takes < 10 seconds in our environment 
> since we run with tight HDFS socket timeouts.
> 3) At step 2), there is a more recent recovery enqueued because of the 
> aggressive retries. This causes the committed block to get preempted and we 
> enter a vicious cycle
> So we do,  <initiate_recovery> --> <commit_block> --> 
> <commit_preempted_by_another_recovery>
> This loop is paused after 300 seconds which is the 
> "hbase.lease.recovery.timeout". Hence the MTTR we are observing is 5 minutes 
> which is terrible. Our ZK session timeout is 30 seconds and HDFS stale node 
> detection timeout is 20 seconds.
> Note that before the patch, we do not call recoverLease so aggressively - 
> also it seems that the HDFS namenode is pretty dumb in that it keeps 
> initiating new recoveries for every call. Before the patch, we call 
> recoverLease, assume that the block was recovered, try to get the file, it 
> has zero length since its under recovery, we fail the task and retry until we 
> get a non zero length. So things just work.
> Fixes:
> 1) Expecting recovery to occur within 1 second is too aggressive. We need to 
> have a more generous timeout. The timeout needs to be configurable since 
> typically, the recovery takes as much time as the DFS timeouts. The primary 
> datanode doing the recovery tries to reconcile the blocks and hits the 
> timeouts when it tries to contact the dead node. So the recovery is as fast 
> as the HDFS timeouts.
> 2) We have another issue I report in HDFS 4721. The Namenode chooses the 
> stale datanode to perform the recovery (since its still alive). Hence the 
> first recovery request is bound to fail. So if we want a tight MTTR, we 
> either need something like HDFS 4721 or we need something like this
>   recoverLease(...)
>   sleep(1000)
>   recoverLease(...)
>   sleep(configuredTimeout)
>   recoverLease(...)
>   sleep(configuredTimeout)
> Where configuredTimeout should be large enough to let the recovery happen but 
> the first timeout is short so that we get past the moot recovery in step #1.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8389) HBASE-8354 forces Namenode into loop with lease recovery requests

Reply via email to