[ 
https://issues.apache.org/jira/browse/HDFS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222648#comment-13222648
 ] 

Kihwal Lee commented on HDFS-3032:
----------------------------------

bq. Hi Kihwal, I think we may simply change LeaseRenewer to retry up to a time 
limit as below. I made the limit to 2*HdfsConstants.LEASE_SOFTLIMIT_PERIOD 
since HdfsConstants.LEASE_SOFTLIMIT_PERIOD is only one minute. What do you 
think?

I walked down that path too, but soon realized that it would abort all clients. 
So my alternative aproach is to do it at individual client. Since the renewal 
is attempted every LEASE_SOFTLIMIT_PERIOD/2, we can be sure that leases are 
expired after LEASE_SOFTLIMIT_PERIOD.

I will upload my patch in a moment. 
                
> Lease renewer tries forever even if renewal is not possible
> -----------------------------------------------------------
>
>                 Key: HDFS-3032
>                 URL: https://issues.apache.org/jira/browse/HDFS-3032
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.23.0, 0.24.0, 0.23.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 0.23.2, 0.23.3
>
>         Attachments: hdfs-3032.patch.txt
>
>
> When LeaseRenewer gets an IOException while attempting to renew for a client, 
> it retries after sleeping 500ms. If the exception is caused by a condition 
> that will never change, it keeps talking to the name node until the DFSClient 
> object is closed or aborted.  With the FileSystem cache, a DFSClient can stay 
> alive for very long time. We've seen the cases in which node managers and 
> long living jobs flooding name node with this type of calls.
> The current proposal is to abort the client when RemoteException is caught 
> during renewal. LeaseRenewer already does abort on all clients when it sees a 
> SocketTimeoutException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to