[
https://issues.apache.org/jira/browse/HADOOP-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997554#comment-13997554
]
Daryn Sharp commented on HADOOP-10523:
--------------------------------------
I see. In your scenarios, I'd say the user shouldn't be canceling tokens that
have been submitted with a job unless they are trying to pre-maturely abort the
job. I know that oozie tokens aren't cancelled which is unfortunate. I think
last year I posted a patch that would cancel after all jobs using the tokens
completed but it ran into roadblocks. I need to lookup and revisit that jira.
In the two suggested approach, I'm not sure how they would be implemented if I
understand them correctly. For #1, the RM can't really test the
validity/existence of a token w/o issuing a renew or cancel and catching the
exception. For #2, the RM still won't know that the token was externally
cancelled, and the issuing service like the NN must cache cancelled tokens and
periodically clean the cache. Due to the complexity, I'd be reluctant to
endorse the approach. I'd also be reluctant to not return errors to a client -
instead returning a token already cancelled instead of token doesn't exist
exception.
I think the better solution is for users to not cancel tokens. Tokens are
supposed to be an "invisible" implementation detail of job submission and thus
not require user manipulation. I'd suggest modifying the RM to either swallow
the cancel error on job completion, or to simply emit a single line in the log
instead of a backtrace.
> Hadoop services (such as RM, NN and JHS) throw confusing exception during
> token auto-cancelation
> -------------------------------------------------------------------------------------------------
>
> Key: HADOOP-10523
> URL: https://issues.apache.org/jira/browse/HADOOP-10523
> Project: Hadoop Common
> Issue Type: Bug
> Components: security
> Affects Versions: 2.3.0
> Reporter: Mohammad Kamrul Islam
> Assignee: Mohammad Kamrul Islam
> Fix For: 2.5.0
>
> Attachments: HADOOP-10523.1.patch
>
>
> When a user explicitly cancels the token, the system (such as RM, NN and JHS)
> also periodically tries to cancel the same token. During the second cancel
> (originated by RM/NN/JHS), Hadoop processes throw the following
> error/exception in the log file. Although the exception is harmless, it
> creates a lot of confusions and causes the dev to spend a lot of time to
> investigate.
> This JIRA is to make sure if the token is available/not cancelled before
> attempting to cancel the token and finally replace this exception with
> proper warning message.
> {noformat}
> 2014-04-15 01:41:14,686 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Token cancelation requested for identifier::
> owner=<FULL_PRINCIPAL>.linkedin.com@REALM, renewer=yarn, realUser=,
> issueDate=1397525405921, maxDate=1398130205921, sequenceNumber=1,
> masterKeyId=2
> 2014-04-15 01:41:14,688 WARN org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:yarn/HOST@<REALM> (auth:KERBEROS)
> cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Token not
> found
> 2014-04-15 01:41:14,689 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 7 on 10020, call
> org.apache.hadoop.mapreduce.v2.api.HSClientProtocolPB.cancelDelegationToken
> from 172.20.128.42:2783 Call#37759 Retry#0: error:
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Token not found
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Token not found
> at
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.cancelToken(AbstractDelegationTokenSecretManager.java:436)
> at
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.cancelDelegationToken(HistoryClientService.java:400)
> at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.cancelDelegationToken(MRClientProtocolPBServiceImpl.java:286)
> at
> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:301)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)