[
https://issues.apache.org/jira/browse/MAPREDUCE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701068#comment-13701068
]
Siddharth Seth commented on MAPREDUCE-5364:
-------------------------------------------
The current patch (mr-5364-1.patch, which has been committed) looks ok at least
in terms of getting rid of the deadlock. As Karthik pointed out, this doesn't
completely fix what MAPREDUCE-4860 was trying to fix.
The addendum patch can cause deadlocks on the call to
{code}setTimerForTokenRenewal{code}. Moving that out of the synchronized block
will just cause an additional renewal to be scheduled after the token is
cancelled - so that doesn't help much either.
A cancelled flag could be used on the DelegationTokenToRenew structure itself.
Set intent to cancel before attempting to cancel the timer task, and check this
during renewal and before queuing another renewal. There's multiple ways this
could be fixed.
> Deadlock between RenewalTimerTask methods cancel() and run()
> ------------------------------------------------------------
>
> Key: MAPREDUCE-5364
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5364
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Fix For: 1.2.1
>
> Attachments: mr-5364-1.patch, mr-5364-addendum-1.patch
>
>
> MAPREDUCE-4860 introduced a local variable {{cancelled}} in
> {{RenewalTimerTask}} to fix the race where {{DelegationTokenRenewal}}
> attempts to renew a token even after the job is removed. However, the patch
> also makes {{run()}} and {{cancel()}} synchronized methods leading to a
> potential deadlock against {{run()}}'s catch-block (error-path).
> The deadlock stacks below:
> {noformat}
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel()
> @bci=0, line=240 (Interpreted frame)
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeDelegationTokenRenewalForJob(org.apache.hadoop.mapreduce.JobID)
> @bci=109, line=319 (Interpreted frame)
> {noformat}
> {noformat}
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeFailedDelegationToken(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
> @bci=62, line=297 (Interpreted frame)
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.access$300(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
> @bci=1, line=47 (Interpreted frame)
> -
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.run()
> @bci=148, line=234 (Interpreted frame)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira