RE: Handling read failures during recovery

Ramkrishna S Vasudevan Thu, 04 Aug 2011 21:22:38 -0700

Hi 

As Laxman pointed out, there is a potential problem here.  We expect the
Namenode recovery to happen within a specified time and we tend to sleep for
one second in the splitLogs logic.  But we carry on with reading the HLog
file which will result in failure.  So if the logs are not split properly
there could be a data loss.

Regards
Ram

-----Original Message-----
From: Laxman [mailto:lakshman...@huawei.com] 
Sent: Tuesday, August 02, 2011 10:47 AM
To: hdfs-dev@hadoop.apache.org; d...@hbase.apache.org
Subject: FW: Handling read failures during recovery

Partial mail was sent accidentally. Sorry for that.
Resending with complete details, analysis and logs.

20-append version we are using.

To summarize there are two problems [One each from HDFS and HBase] we
noticed in this flow.

1) From HDFS
Even though client is getting the updated block info from Namenode on first
read failure, client is discarding the new info and using the old info only
to retrieve the data from datanode. So, all the read 
retries are failing. [Method parameter reassignment - Not reflected in
caller]

HDFS Code snippet
org.apache.hadoop.hdfs.DFSClient.DFSInputStream.chooseDataNode 

 private DNAddrPair chooseDataNode(LocatedBlock block) 
      throws IOException {
...
...
block = getBlockAt(block.getStartOffset(), false);
...
...
}

Here method parameter "block" is assigned with the new block info which is
not reflected in the caller "blockSeekTo(long target)".

2) From HBase

Excerpt from my previous mail.

> As the recovery is an asynchronous operation recoverLease call will return
> immediately and may end up with read failure as the recovery is in
progress.
> 
> This may lead to some regions to be in offline state only

> One approach is to introduce a delay in between recovery and read. But,
this
> may not be a fool proof way to address this.

I've noticed the delay is already present in HBase code. But as I mentioned
this may not be a fool proof mechanism to handle this scenario.

HBase Code snippet
In the class HLogSplitter the splitLog() calls recoverFileLease(). 

In recoverFileLease() 

      try { 
        Thread.sleep(1000); 
      } catch (InterruptedException ex) { 
        new InterruptedIOException().initCause(ex); 
      } 

Once the recover call is made we sleep for one sec and proceed with
parseHLog().

Here is the log
2011-07-21 17:01:19,642 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_1311262402613_3094 from any node: java.io.IOException: No
live nodes contain current block. Will get new block locations from namenode
and retry...
2011-07-21 17:01:20,650 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_1311262402613_3094 from any node: java.io.IOException: No
live nodes contain current block. Will get new block locations from namenode
and retry...
2011-07-21 17:01:21,669 INFO org.apache.hadoop.hdfs.DFSClient: Could not
obtain block blk_1311262402613_3094 from any node: java.io.IOException: No
live nodes contain current block. Will get new block locations from namenode
and retry...
2011-07-21 17:01:22,677 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read:
java.io.IOException: Could not obtain block: blk_1311262402613_3318
file=/hbase/.logs/158-1-101-222,20020,1311260346420/158-1-101-222%3A20020.13
11265398432
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.jav
a:2491)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2
256)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2441)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1984)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1884)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(Sequence
FileLogReader.java:198)
at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(Sequence
FileLogReader.java:172)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter
.java:429)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.
java:262)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.
java:188)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.ja
va:201)

-----Original Message-----
From: Stack [mailto:saint....@gmail.com] 
Sent: Monday, August 01, 2011 9:03 PM
To: d...@hbase.apache.org
Cc: d...@hbase.apache.org
Subject: Re: Handling read failures during recovery

Which hdfs version and what is the error u see?  Thanks.

Stack

On Aug 1, 2011, at 4:33, Laxman <lakshman...@huawei.com> wrote:

> Hi Everyone,
> 
> 
> 
> In HBase we try to recover the HLog file and then immediately proceed with
> read operation.
> 
> As the recovery is an asynchronous operation recoverLease call will return
> immediately and may end up with read failure as the recovery is in
progress.
> 
> This may lead to some regions to be in offline state only.
> 
> 
> 
> One approach is to introduce a delay in between recovery and read. But,
this
> may not be a fool proof way to address this.
> 
> 
> 
> How do we handle this scenario? 
> 
> 
> 
> Please do correct me if my understanding went wrong.
> 
> --Laxman
>

RE: Handling read failures during recovery

Reply via email to