[ 
https://issues.apache.org/jira/browse/HADOOP-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834590#comment-16834590
 ] 

Steve Loughran commented on HADOOP-16298:
-----------------------------------------

thanks for this. 

* removed all "fix" labels -that's to track when fixes go in.
* and set target as 3.3 with the option of backporting to 3.2. Changes are not 
likely to go back much earlier.

It'd be good if you could publish this doc as a github PR where we could 
comment on the text; PDFs on JIRAs aren't ideal for review.

My own experience of tokens and long-lived services is [documented 
elsewhere|https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/yarn.html].
 You should also look at applications like Spark to see how they renew their 
tokens. If you have, please document this to show you've done that bit of 
homework and to compare it with your proposal.

Looking at the text

* {{AbstractDelegationTokenIdentifier}} has a maxDate field; this sets the end 
of their lifespan. This is precisely how Spark's DT renewal mechanism knows 
when to renew tokens.
* There's another renewal mechanism to consider: client side upload of new DTs 
via some RPC mechanism. Even if you think it is flawed, I'd like to see 
coverage and a description of why you think it isn't suitable. 
* There are more places than just IPC where we need those tokens: HBase, Hive 
metastore, KMS key management service are three examples, while the S3A and 
ABFS token support both use it for REST auth. Any design should also be able to 
work with SPNEGO auth.

UGI scares us. Really scares us. It's a critical piece of the security 
infrastructure and we are always reluctant to make changes to it due to the 
risk of unintentionally weakening the security mechanism. Nobody is going to 
rush to add features and the number of people who will be willing to review the 
changes will be very low. That doesn't mean it can't be improved, just that we 
are always nervous.


Given that all subclasses of {{AbstractDelegationTokenIdentifier}} do have an 
expiry date, I don't think we need the on-demand mechanism, more just something 
which calculates the expiry time of tokens and then reloads as needed. This is 
~what the spark DT renewer does, though it actually uses its keytab to 
re-request tokens from the services for passing to workers.

That said, I see the potential. 





> Manage/Renew delegation tokens for externally scheduled jobs
> ------------------------------------------------------------
>
>                 Key: HADOOP-16298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16298
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 2.7.3, 2.9.0, 3.2.0, 3.3.0
>            Reporter: Pankaj Deshpande
>            Priority: Major
>         Attachments: Proposal for changes to UGI for managing_renewing 
> externally managed delegation tokens.pdf
>
>
> * Presently when jobs are run in the Hadoop ecosystem, the implicit 
> assumption is that YARN will be used as a scheduling agent with access to 
> appropriate keytabs for renewal of kerberos tickets and delegation tokens. 
>  * Jobs that interact with kerberized hadoop services such as hbase/hive/hdfs 
> and use an external scheduler such as Kubernetes, typically do not have 
> access to keytabs. In such cases, delegation tokens are a logical choice for 
> interacting with a kerberized cluster. These tokens are issued based on some 
> external auth mechanism (such as Kube LDAP authentication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to