[ 
https://issues.apache.org/jira/browse/HDFS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102332#comment-13102332
 ] 

Ted Dunning commented on HDFS-2296:
-----------------------------------

How can you pause all readers?  The cluster doesn't know who the readers are, 
so the only approach is to pause reading from datanode.

If you want to make all datanode read operations wait until recovery completes, 
then you almost have to use some kind of two phase operation at the beginning 
of lease recovery.

Thus, 

- the new writer initiates recovery

- recovery-propose has to be sent to all containers of the file

- all readers will pause because of recovery-start, but the lease is not yet 
changed

- recovery-commit has to be sent to all containers of the file

- the new writer is notified that recovery has succeeded

- readers continue under the new lease structure

The major problem here is that failures will make this hideously complex as 
with all two-phase designs.  Unfortunately, there is little alternative given 
the basic stateful design of lease-oriented single-writer architecture.

A secondary problem is that committing the lease recover across all replicas of 
all blocks of a file could involve a large number of datanodes and checking for 
the recovery imposes complexity on the fundamental read path.

Some of the failure conditions can be "handled" by time outs on the 
recovery-propose, but timeouts are inherently very, very dangerous.

> If read error while lease is being recovered, client reverts to stale view on 
> block info
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-2296
>                 URL: https://issues.apache.org/jira/browse/HDFS-2296
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.20-append, 0.22.0, 0.23.0
>            Reporter: stack
>            Priority: Critical
>
> We are seeing the following issue around recoverLease over in hbaselandia.  
> DFSClient calls recoverLease to assume ownership of a file.  The recoverLease 
> returns to the client but it can take time for the new state to propagate.  
> Meantime, an incoming read fails though its using updated block info.  
> Thereafter all read retries fail because on exception we revert to stale 
> block view and we never recover.  Laxman reports this issue in the below 
> mailing thread:
> See this thread for first report of this issue: 
> http://search-hadoop.com/m/S1mOHFRmgk2/%2527FW%253A+Handling+read+failures+during+recovery%2527&subj=FW+Handling+read+failures+during+recovery
> Chatting w/ Hairong offline, she suggests this a general issue around lease 
> recovery no matter how it triggered (new recoverLease or not).
> I marked this critical.  At least over in hbase it is since we get set stuck 
> here recovering a crashed server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to