[ http://issues.apache.org/jira/browse/HADOOP-563?page=all ]
Doug Cutting updated HADOOP-563: -------------------------------- Status: Resolved (was: Patch Available) Fix Version/s: 0.8.0 Resolution: Fixed I just committed this. Thanks, Dhruba! > DFS client should try to re-new lease if it gets a lease expiration exception > when it adds a block to a file > ------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-563 > URL: http://issues.apache.org/jira/browse/HADOOP-563 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Runping Qi > Assigned To: dhruba borthakur > Fix For: 0.8.0 > > Attachments: softhardlease.patch > > > In the current DFS client implementation, there is one thread responsible for > renewing leases. If for whatever reason, that thread runs behind, the lease > may get expired. That causes the client gets a lease expiration exception > when writing a block. The consequence of that is very devastating: the client > can no longer write to the file, and all the partial results up to that point > are gone! This is especially costly for some map reduce jobs where a reducer > may take hours or even days to sort the intermediate results before the > actual reducing work can start. > The problem will be solved if the flush method of DFS client can renew lease > on demand. That is, it should try to re-new lease when it catches a lease > expiration exception. That way, even when under heavy load and the lease > renewing thread runs behind, the reducer task (or what ever tasks use that > client) can preceed. That will be a huge saving in some cases (where sorting > intermediate results take a long time to finish). We can set a limit on the > number of retries, and may even make it configurable (or changeable at > runtime). > The namenode can use a different expiration time that is much higher than the > current 1 minute lease expiration time for cleaning up the abandoned > unclosed files. > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira