[ 
https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077886#comment-17077886
 ] 

Tao Yang commented on HDFS-14575:
---------------------------------

Thanks [~weichiu] for the review and suggestion.
You are right, we can prevent that failure if each lease renewer only has one 
daemon thread, I think the approach you gave can works fine for this issue. I'm 
planning to add stopping state for the LeaseRenewer, it can be updated when the 
LeaseRenewer is exiting (in the synchronized block of LeaseRenewer#run), and 
can be checked at the beginning of LeaseRenewer#put (synchronized method, if 
stopping is true, throw exception and then handle it in 
DFSClient#beginFileLease as you suggested), in this way we can guarantee that 
the lease renewer into which client is putted won't exit unexpectedly (since 
emptyTime is updated to Long.MAX_VALUE at the end of LeaseRenewer#put). Hope to 
hear your thoughts.
The UT can simulate the race condition but hard to verify the fix, any 
suggestion about this?

> LeaseRenewer#daemon threads leak in DFSClient
> ---------------------------------------------
>
>                 Key: HDFS-14575
>                 URL: https://issues.apache.org/jira/browse/HDFS-14575
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch
>
>
> Currently LeaseRenewer (and its daemon thread) without clients should be 
> terminated after a grace period which defaults to 60 seconds. A race 
> condition may happen when a new request is coming just after LeaseRenewer 
> expired.
>  Reproduce this race condition:
>  # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 
> thread, after a few seconds, File#1 is closed , there is no clients in 
> LeaseRenewer#1 now.
>  # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 
> thread is still in sleep, Client#1 creates File#2, lead to the creation of 
> Daemon#2.
>  # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from 
> factory.
>  # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it 
> can’t get renewer from factory.
> Daemon#2 thread leaks from now on, since Client#1 in it can never be removed 
> and it won't have a chance to stop.
> To solve this problem, IIUIC, a simple way I think is to make sure that all 
> clients are cleared when LeaseRenewer is removed from factory. Please feel 
> free to give your suggestions. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to