[ http://issues.apache.org/jira/browse/HADOOP-563?page=all ]
dhruba borthakur updated HADOOP-563: ------------------------------------ Attachment: softhardlease.patch A client's lease goes stale if it fails to renew the lease for 1 minute. After that period, if a different client request a lease for the same file, the resouces associated with the original lease gets reclaimed and the new lease-request is satisfied. A client's lease trasitions from stale to expired after a 1 hour period. At that time all resouces associated with that lease gets reclaimed. > DFS client should try to re-new lease if it gets a lease expiration exception > when it adds a block to a file > ------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-563 > URL: http://issues.apache.org/jira/browse/HADOOP-563 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Runping Qi > Assigned To: dhruba borthakur > Attachments: softhardlease.patch > > > In the current DFS client implementation, there is one thread responsible for > renewing leases. If for whatever reason, that thread runs behind, the lease > may get expired. That causes the client gets a lease expiration exception > when writing a block. The consequence of that is very devastating: the client > can no longer write to the file, and all the partial results up to that point > are gone! This is especially costly for some map reduce jobs where a reducer > may take hours or even days to sort the intermediate results before the > actual reducing work can start. > The problem will be solved if the flush method of DFS client can renew lease > on demand. That is, it should try to re-new lease when it catches a lease > expiration exception. That way, even when under heavy load and the lease > renewing thread runs behind, the reducer task (or what ever tasks use that > client) can preceed. That will be a huge saving in some cases (where sorting > intermediate results take a long time to finish). We can set a limit on the > number of retries, and may even make it configurable (or changeable at > runtime). > The namenode can use a different expiration time that is much higher than the > current 1 minute lease expiration time for cleaning up the abandoned > unclosed files. > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira