[ 
https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077575#comment-17077575
 ] 

Wei-Chiu Chuang commented on HDFS-14575:
----------------------------------------

Thanks for reporting the issue & the patch.

The explanation makes perfect sense to me. The regression test also looks good 
to me.

However, I feel the fix itself is not ideal.

{quote}
Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 thread, 
after a few seconds, File#1 is closed , there is no clients in LeaseRenewer#1 
now.
60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 
thread is still in sleep, Client#1 creates File#2, lead to the creation of 
Daemon#2.
Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from factory.
File#2 is closed after a few seconds, LeaseRenewer#2 is created since it can’t 
get renewer from factory.
{quote}
Considering the event sequence as described in the description, the Daemon#2 is 
created but exits right away. File#2 will not be renewed normally as a result, 
and namenode will then considers it a failure and close the file.

IMO the each lease renewer should only be associated with one daemon 
variable/thread.
I wonder if {{LeaserRenewer#put()}} should throw an exception when the variable 
daemon is already set, and then inside {{DFSClient#beginFileLease()}}, remove 
the renewer and create a new instances.

> LeaseRenewer#daemon threads leak in DFSClient
> ---------------------------------------------
>
>                 Key: HDFS-14575
>                 URL: https://issues.apache.org/jira/browse/HDFS-14575
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch
>
>
> Currently LeaseRenewer (and its daemon thread) without clients should be 
> terminated after a grace period which defaults to 60 seconds. A race 
> condition may happen when a new request is coming just after LeaseRenewer 
> expired.
>  Reproduce this race condition:
>  # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 
> thread, after a few seconds, File#1 is closed , there is no clients in 
> LeaseRenewer#1 now.
>  # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 
> thread is still in sleep, Client#1 creates File#2, lead to the creation of 
> Daemon#2.
>  # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from 
> factory.
>  # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it 
> can’t get renewer from factory.
> Daemon#2 thread leaks from now on, since Client#1 in it can never be removed 
> and it won't have a chance to stop.
> To solve this problem, IIUIC, a simple way I think is to make sure that all 
> clients are cleared when LeaseRenewer is removed from factory. Please feel 
> free to give your suggestions. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to