[
https://issues.apache.org/jira/browse/MAPREDUCE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706161#comment-13706161
]
Siddharth Seth commented on MAPREDUCE-5364:
-------------------------------------------
bq. Looking at the code, I don't see a deadlock possibility. While a call to
setTimerForTokenRenewal requires a lock on DelegationTokenRenewer.class, I
don't see any method holding a lock on DelegationTokenRenewer.class requiring a
lock on delegationTokens or cancelled flag. Am I missing something here?
You're right. I was somehow considering removeDelegationTokenRenewalForJob to
be a synchronized method. Sorry about that.
This could be fixed via the original jira (MAPREDUCE-4860 or a new jira). The
deadlock being resolved was the main issues in this jira which is already
fixed. An extra renewal just leads to an additional exception message in the
logs, correct ? or is it more severe than that (other than the failed unit
test).
Comments on the patch itself.
The previous patch is likely better. One concern with the current patch -
'cancelled' is associated with the current RenewalTimerTask. If
removeDelegationTokenRenewalForJob tries to cancel() while a token renewal is
in progress - it effectively has no affect, since a new RenewalTimerTask would
be scheduled. This may not be an issue since the reference to the
DelegationTokenToRenew object will be removed from the list of
delegationTokens. Since renew has been moved into DelegationTokenToRenew - I'd
prefer having the cancel / intent to cancel associated with that as well.
> Deadlock between RenewalTimerTask methods cancel() and run()
> ------------------------------------------------------------
>
> Key: MAPREDUCE-5364
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5364
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Fix For: 1.2.1
>
> Attachments: mr-5364-1.patch, mr-5364-addendum-1.patch,
> mr-5364-addendum-2.patch
>
>
> MAPREDUCE-4860 introduced a local variable {{cancelled}} in
> {{RenewalTimerTask}} to fix the race where {{DelegationTokenRenewal}}
> attempts to renew a token even after the job is removed. However, the patch
> also makes {{run()}} and {{cancel()}} synchronized methods leading to a
> potential deadlock against {{run()}}'s catch-block (error-path).
> The deadlock stacks below:
> {noformat}
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel()
> @bci=0, line=240 (Interpreted frame)
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeDelegationTokenRenewalForJob(org.apache.hadoop.mapreduce.JobID)
> @bci=109, line=319 (Interpreted frame)
> {noformat}
> {noformat}
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeFailedDelegationToken(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
> @bci=62, line=297 (Interpreted frame)
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.access$300(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
> @bci=1, line=47 (Interpreted frame)
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.run()
> @bci=148, line=234 (Interpreted frame)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira