[
https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077575#comment-17077575
]
Wei-Chiu Chuang commented on HDFS-14575:
----------------------------------------
Thanks for reporting the issue & the patch.
The explanation makes perfect sense to me. The regression test also looks good
to me.
However, I feel the fix itself is not ideal.
{quote}
Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 thread,
after a few seconds, File#1 is closed , there is no clients in LeaseRenewer#1
now.
60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1
thread is still in sleep, Client#1 creates File#2, lead to the creation of
Daemon#2.
Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from factory.
File#2 is closed after a few seconds, LeaseRenewer#2 is created since it can’t
get renewer from factory.
{quote}
Considering the event sequence as described in the description, the Daemon#2 is
created but exits right away. File#2 will not be renewed normally as a result,
and namenode will then considers it a failure and close the file.
IMO the each lease renewer should only be associated with one daemon
variable/thread.
I wonder if {{LeaserRenewer#put()}} should throw an exception when the variable
daemon is already set, and then inside {{DFSClient#beginFileLease()}}, remove
the renewer and create a new instances.
> LeaseRenewer#daemon threads leak in DFSClient
> ---------------------------------------------
>
> Key: HDFS-14575
> URL: https://issues.apache.org/jira/browse/HDFS-14575
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch
>
>
> Currently LeaseRenewer (and its daemon thread) without clients should be
> terminated after a grace period which defaults to 60 seconds. A race
> condition may happen when a new request is coming just after LeaseRenewer
> expired.
> Reproduce this race condition:
> # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1
> thread, after a few seconds, File#1 is closed , there is no clients in
> LeaseRenewer#1 now.
> # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1
> thread is still in sleep, Client#1 creates File#2, lead to the creation of
> Daemon#2.
> # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from
> factory.
> # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it
> can’t get renewer from factory.
> Daemon#2 thread leaks from now on, since Client#1 in it can never be removed
> and it won't have a chance to stop.
> To solve this problem, IIUIC, a simple way I think is to make sure that all
> clients are cleared when LeaseRenewer is removed from factory. Please feel
> free to give your suggestions. Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]