[
https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252859#comment-16252859
]
Jiandan Yang edited comment on HDFS-12754 at 11/15/17 2:35 AM:
----------------------------------------------------------------
[~xiaochen] Thank you for reviewing.
@ The fix here is to close the output streams out of the lease renewer lock
I think you may be wrong. The fix is {{LeaseRenewer#run}} does not hold
{{LeaseRenewer}} object lock and {{DFSOutputStream}} object lock at the same
time, removes dfsClient.closeAllFilesBeingWritten out of synchronized block.
{{LeaseRenewer#run}} gets {{LeaseRenewer}} object lock and then releases, gets
{{DFSOutputStream}} object lock and releases.
{code:java}
synchronized (this) {
DFSClientFaultInjector.get().sleepWhenRenewLeaseTimeout();
dfsclientsCopy = new ArrayList<>(dfsclients);
dfsclients.clear();
//Expire the current LeaseRenewer thread.
emptyTime = 0;
Factory.INSTANCE.remove(LeaseRenewer.this);
}
for (DFSClient dfsClient : dfsclientsCopy) {
dfsClient.closeAllFilesBeingWritten(true);
}
{code}
was (Author: yangjiandan):
[~xiaochen] Thank you for reviewing.
@The fix here is to close the output streams out of the lease renewer lock
I think you may be wrong. The fix is {{LeaseRenewer#run}} does not hold
{{LeaseRenewer}} object lock and {{DFSOutputStream}} object lock at the same
time, removes dfsClient.closeAllFilesBeingWritten out of synchronized block.
{{LeaseRenewer#run}} gets {{LeaseRenewer}} object lock and then releases, gets
{{DFSOutputStream}} object lock and releases.
{code:java}
synchronized (this) {
DFSClientFaultInjector.get().sleepWhenRenewLeaseTimeout();
dfsclientsCopy = new ArrayList<>(dfsclients);
dfsclients.clear();
//Expire the current LeaseRenewer thread.
emptyTime = 0;
Factory.INSTANCE.remove(LeaseRenewer.this);
}
for (DFSClient dfsClient : dfsclientsCopy) {
dfsClient.closeAllFilesBeingWritten(true);
}
{code}
> Lease renewal can hit a deadlock
> ---------------------------------
>
> Key: HDFS-12754
> URL: https://issues.apache.org/jira/browse/HDFS-12754
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.8.1
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: HDFS-12754.001.patch, HDFS-12754.002.patch,
> HDFS-12754.003.patch, HDFS-12754.004.patch, HDFS-12754.005.patch,
> HDFS-12754.006.patch, HDFS-12754.007.patch
>
>
> The Client and the renewer can hit a deadlock during close operation since
> closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is
> possible if the client class close when the renewer is renewing a lease.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]