[
https://issues.apache.org/jira/browse/HDFS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219672#comment-13219672
]
Kihwal Lee commented on HDFS-3032:
----------------------------------
> The current proposal is to abort the client when RemoteException is caught
> during renewal.
DFSClient.abort() calls abort() on all output streams and clientRunning is set
to false. LeaseRenewer thread will stop trying renewing for the client. The
thread will return if the client list becomes empty.
One fatal case we observed involed an expired token. The token had expired and
couldn't be renewed due to a different issue. This made any further RPC calls
return InvalidToken from the name node, which in turn made LeaseRenewer repeat
this behavior.
Among the exceptions wrapped in RemoteException, SafeModeException might be
okay for retry. All other exceptions from name node seem fatal for lease
renewal.
> Lease renewer tries forever even if renewal is not possible
> -----------------------------------------------------------
>
> Key: HDFS-3032
> URL: https://issues.apache.org/jira/browse/HDFS-3032
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client
> Affects Versions: 0.23.0, 0.24.0, 0.23.1
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Fix For: 0.24.0, 0.23.2, 0.23.3
>
>
> When LeaseRenewer gets an IOException while attempting to renew for a client,
> it retries after sleeping 500ms. If the exception is caused by a condition
> that will never change, it keeps talking to the name node until the DFSClient
> object is closed or aborted. With the FileSystem cache, a DFSClient can stay
> alive for very long time. We've seen the cases in which node managers and
> long living jobs flooding name node with this type of calls.
> The current proposal is to abort the client when RemoteException is caught
> during renewal. LeaseRenewer already does abort on all clients when it sees a
> SocketTimeoutException.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira