[ 
https://issues.apache.org/jira/browse/HADOOP-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997554#comment-13997554
 ] 

Daryn Sharp commented on HADOOP-10523:
--------------------------------------

I see.  In your scenarios, I'd say the user shouldn't be canceling tokens that 
have been submitted with a job unless they are trying to pre-maturely abort the 
job.  I know that oozie tokens aren't cancelled which is unfortunate.  I think 
last year I posted a patch that would cancel after all jobs using the tokens 
completed but it ran into roadblocks.  I need to lookup and revisit that jira.

In the two suggested approach, I'm not sure how they would be implemented if I 
understand them correctly.  For #1, the RM can't really test the 
validity/existence of a token w/o issuing a renew or cancel and catching the 
exception.  For #2, the RM still won't know that the token was externally 
cancelled, and the issuing service like the NN must cache cancelled tokens and 
periodically clean the cache.  Due to the complexity, I'd be reluctant to 
endorse the approach.  I'd also be reluctant to not return errors to a client - 
instead returning a token already cancelled instead of token doesn't exist 
exception.

I think the better solution is for users to not cancel tokens.  Tokens are 
supposed to be an "invisible" implementation detail of job submission and thus 
not require user manipulation.  I'd suggest modifying the RM to either swallow 
the cancel error on job completion, or to simply emit a single line in the log 
instead of a backtrace.

> Hadoop services (such as RM, NN and JHS) throw confusing exception during 
> token auto-cancelation 
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10523
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Mohammad Kamrul Islam
>            Assignee: Mohammad Kamrul Islam
>             Fix For: 2.5.0
>
>         Attachments: HADOOP-10523.1.patch
>
>
> When a user explicitly cancels the token, the system (such as RM, NN and JHS) 
> also periodically tries to cancel the same token. During the second cancel 
> (originated by RM/NN/JHS), Hadoop processes throw the following 
> error/exception in the  log file. Although the exception is harmless, it 
> creates a lot of confusions and causes the dev to spend a lot of time to 
> investigate.
> This JIRA is to make sure if the token is available/not cancelled before 
> attempting to cancel the token and  finally replace this exception with 
> proper warning message.
> {noformat}
> 2014-04-15 01:41:14,686 INFO 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  Token cancelation requested for identifier:: 
> owner=<FULL_PRINCIPAL>.linkedin.com@REALM, renewer=yarn, realUser=, 
> issueDate=1397525405921, maxDate=1398130205921, sequenceNumber=1, 
> masterKeyId=2
> 2014-04-15 01:41:14,688 WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:yarn/HOST@<REALM> (auth:KERBEROS) 
> cause:org.apache.hadoop.security.token.SecretManager$InvalidToken: Token not 
> found
> 2014-04-15 01:41:14,689 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 7 on 10020, call 
> org.apache.hadoop.mapreduce.v2.api.HSClientProtocolPB.cancelDelegationToken 
> from 172.20.128.42:2783 Call#37759 Retry#0: error: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Token not found
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Token not found
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.cancelToken(AbstractDelegationTokenSecretManager.java:436)
>         at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.cancelDelegationToken(HistoryClientService.java:400)
>         at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.cancelDelegationToken(MRClientProtocolPBServiceImpl.java:286)
>         at 
> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:301)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to