DFS client should try to re-new lease if it gets a lease expiration exception
when it adds a block to a file
------------------------------------------------------------------------------------------------------------
Key: HADOOP-563
URL: http://issues.apache.org/jira/browse/HADOOP-563
Project: Hadoop
Issue Type: Bug
Reporter: Runping Qi
In the current DFS client implementation, there is one thread responsible for
renewing leases. If for whatever reason, that thread runs behind, the lease may
get expired. That causes the client gets a lease expiration exception when
writing a block. The consequence of that is very devastating: the client can no
longer write to the file, and all the partial results up to that point are
gone! This is especially costly for some map reduce jobs where a reducer may
take hours or even days to sort the intermediate results before the actual
reducing work can start.
The problem will be solved if the flush method of DFS client can renew lease
on demand. That is, it should try to re-new lease when it catches a lease
expiration exception. That way, even when under heavy load and the lease
renewing thread runs behind, the reducer task (or what ever tasks use that
client) can preceed. That will be a huge saving in some cases (where sorting
intermediate results take a long time to finish). We can set a limit on the
number of retries, and may even make it configurable (or changeable at
runtime).
The namenode can use a different expiration time that is much higher than the
current 1 minute lease expiration time for cleaning up the abandoned unclosed
files.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira