[ http://issues.apache.org/jira/browse/HADOOP-563?page=all ]

dhruba borthakur updated HADOOP-563:
------------------------------------

    Status: Patch Available  (was: Open)

A client's lease goes stale if it fails to renew the lease for 1 minute. After 
that period, if a different client request a lease for the same file, the 
resouces associated with the original lease gets reclaimed and the new 
lease-request is satisfied.
A client's lease trasitions from stale to expired after a 1 hour period. At 
that time all resouces associated with that lease gets reclaimed.

> DFS client should try to re-new lease if it gets a lease expiration exception 
> when it adds a block to a file
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-563
>                 URL: http://issues.apache.org/jira/browse/HADOOP-563
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Runping Qi
>         Assigned To: dhruba borthakur
>         Attachments: softhardlease.patch
>
>
> In the current DFS client implementation, there is one thread responsible for 
> renewing leases. If for whatever reason, that thread runs behind, the lease 
> may get expired. That causes the client gets a lease expiration exception 
> when writing a block. The consequence of that is very devastating: the client 
> can no longer write to the file, and all the partial results up to that point 
> are gone! This is especially costly for some map reduce jobs where a reducer 
> may take hours or even days to sort the intermediate results before the 
> actual reducing work can start.
> The problem will be solved if the flush method of  DFS client can renew lease 
> on demand. That is, it should try to re-new lease  when it catches a lease 
> expiration exception. That way,  even when under heavy load and the lease 
> renewing thread runs behind, the reducer  task (or what ever tasks use that 
> client) can preceed.  That will be a huge saving in some cases (where sorting 
> intermediate results take a long time to finish). We can set a limit on the 
> number of retries, and may even make it configurable (or changeable at 
> runtime). 
> The namenode can use a different expiration time that is much higher than the 
> current 1 minute lease expiration time for cleaning  up the abandoned 
> unclosed files.
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to