[ 
https://issues.apache.org/jira/browse/HDFS-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4979:
----------------------------------

    Attachment: HDFS-4979.2.patch

When an operation successfully completes, retryCache is populated with the 
request information and any payload that is needed for generating response for 
retried request. We need to handle the following use case:
# A client makes a request. The operation is complete on the namenode. Client 
does not get the response. Retries the request.
#* In this case, since the operation is complete on the namenode, the operation 
is recorded in the retry cache. That means, retry cache can be checked and 
retry can be handled outside the namesystem lock.
# A client makes a request. The operation is still in progress on the namenode. 
Client gets disconnected for some reason. Retries the request.
#* In this case, since the operation is still in progress on the namenode, the 
operation is not recorded in the retry cache. That means, retry cache *must* be 
checked and retry can be handled only inside the namesystem lock.

Given the second issue, I plan to do the retry checks inside the lock. 

Following conditions need to be handled by retry cache for various operations.
*File creation*
# Retry cache has <RPC request client ID + call ID and Inode Id that was 
created in the previous attempt)
# Between two retries the following can happen and how it is handled:
#* File created in attempt 1 was modified (new permission etc. I plan to just 
return the current, changed HdfsFileStatus in create.
#* File created in attempt 1 was deleted. Second retry will create a new file.
#* File created in attempt 1 was deleted and a new file has been created. The 
retry cache entry is not used and a new attempt to create a file is made, which 
fails as expected.
#* The current patch does not handle the case where between retries a file got 
closed, due to lease timeout, explicit recover lease call. In such a case 
getting subsequent additional block fails.

*File Append*
# Retry cache has <RPC request client ID + call ID and block ID, Generation 
stamp of previously returned LocatedBlock)
# Between two retries the following can happen and how it is handled:
#* If previous append attempt returned null, irrespective of the file being 
appened is deleted between retries or not, null is returned. If the file is 
deleted between retries, next attempt to get additional block fails.
#** If the file (that is the last block is complete), null is returned.
#* If file appending to in previous try is deleted, block no longer exists. 
Retry cache will not be used and a new attempt to append to the file is made.
#* The current patch does not handle the case where between retries a file got 
closed, due to lease timeout, explicit recover lease call. In such a case 
appending to block fails.

The above two cases are different because the return type is a comprehensive 
object. Hence instead of storing the object in retry cache, it is generated 
during retry attempts. In the following cases, return type is a simple object. 
Hence it results in simple handling with just void, string or boolean returned.

Alternatively, at the expense of more memory we can track returned response for 
create and append and simplify the code further. Any thoughts?

*Concat, createSymlink, renameTo (both the variants), delete, createSnapshot, 
deleteSnapshot etc*
# Retry cache has <RPC request client ID + call ID>
# If retry cache entry is found, call immediately return with void, boolean or 
String.

*Still pending*
# updatePipeline()
# rollEditLog()
# endCheckpoint()
# commitBlockSynchronization()

                
> Implement retry cache on the namenode
> -------------------------------------
>
>                 Key: HDFS-4979
>                 URL: https://issues.apache.org/jira/browse/HDFS-4979
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: HDFS-4979.1.patch, HDFS-4979.2.patch, HDFS-4979.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to