[jira] [Created] (HDFS-4721) Speed up lease/block recovery when DN fails and a block goes into recovery

Varun Sharma (JIRA) Sun, 21 Apr 2013 10:19:17 -0700

Varun Sharma created HDFS-4721:
----------------------------------

             Summary: Speed up lease/block recovery when DN fails and a block 
goes into recovery
                 Key: HDFS-4721
                 URL: https://issues.apache.org/jira/browse/HDFS-4721
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: namenode
    Affects Versions: 2.0.3-alpha
            Reporter: Varun Sharma



This was observed while doing HBase WAL recovery. HBase uses append to write to 
its write ahead log. So initially the pipeline is setup as

DN1 --> DN2 --> DN3

This WAL needs to be read when DN1 fails since it houses the HBase regionserver 
for the WAL.

HBase first recovers the lease on the WAL file. During recovery, we choose DN1 
as the primary DN to do the recovery even though DN1 has failed and is not 
heartbeating any more.

Avoiding the stale DN1 would speed up recovery and reduce hbase MTTR. There are 
two options.
a) Ride on HDFS 3703 and if stale node detection is turned on, we do not choose 
stale datanodes (typically not heart beated for 20-30 seconds) as primary DN(s)
b) We sort the replicas in order of last heart beat and always pick the ones 
which gave the most recent heart beat

Going to the dead datanode increases lease + block recovery since the block 
goes into UNDER_RECOVERY state even though no one is recovering it actively. 
Please let me know if this makes sense. If yes, whether we should move forward 
with a) or b).

Thanks


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4721) Speed up lease/block recovery when DN fails and a block goes into recovery

Reply via email to