[
https://issues.apache.org/jira/browse/HADOOP-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834590#comment-16834590
]
Steve Loughran commented on HADOOP-16298:
-----------------------------------------
thanks for this.
* removed all "fix" labels -that's to track when fixes go in.
* and set target as 3.3 with the option of backporting to 3.2. Changes are not
likely to go back much earlier.
It'd be good if you could publish this doc as a github PR where we could
comment on the text; PDFs on JIRAs aren't ideal for review.
My own experience of tokens and long-lived services is [documented
elsewhere|https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/yarn.html].
You should also look at applications like Spark to see how they renew their
tokens. If you have, please document this to show you've done that bit of
homework and to compare it with your proposal.
Looking at the text
* {{AbstractDelegationTokenIdentifier}} has a maxDate field; this sets the end
of their lifespan. This is precisely how Spark's DT renewal mechanism knows
when to renew tokens.
* There's another renewal mechanism to consider: client side upload of new DTs
via some RPC mechanism. Even if you think it is flawed, I'd like to see
coverage and a description of why you think it isn't suitable.
* There are more places than just IPC where we need those tokens: HBase, Hive
metastore, KMS key management service are three examples, while the S3A and
ABFS token support both use it for REST auth. Any design should also be able to
work with SPNEGO auth.
UGI scares us. Really scares us. It's a critical piece of the security
infrastructure and we are always reluctant to make changes to it due to the
risk of unintentionally weakening the security mechanism. Nobody is going to
rush to add features and the number of people who will be willing to review the
changes will be very low. That doesn't mean it can't be improved, just that we
are always nervous.
Given that all subclasses of {{AbstractDelegationTokenIdentifier}} do have an
expiry date, I don't think we need the on-demand mechanism, more just something
which calculates the expiry time of tokens and then reloads as needed. This is
~what the spark DT renewer does, though it actually uses its keytab to
re-request tokens from the services for passing to workers.
That said, I see the potential.
> Manage/Renew delegation tokens for externally scheduled jobs
> ------------------------------------------------------------
>
> Key: HADOOP-16298
> URL: https://issues.apache.org/jira/browse/HADOOP-16298
> Project: Hadoop Common
> Issue Type: Improvement
> Components: security
> Affects Versions: 2.7.3, 2.9.0, 3.2.0, 3.3.0
> Reporter: Pankaj Deshpande
> Priority: Major
> Attachments: Proposal for changes to UGI for managing_renewing
> externally managed delegation tokens.pdf
>
>
> * Presently when jobs are run in the Hadoop ecosystem, the implicit
> assumption is that YARN will be used as a scheduling agent with access to
> appropriate keytabs for renewal of kerberos tickets and delegation tokens.
> * Jobs that interact with kerberized hadoop services such as hbase/hive/hdfs
> and use an external scheduler such as Kubernetes, typically do not have
> access to keytabs. In such cases, delegation tokens are a logical choice for
> interacting with a kerberized cluster. These tokens are issued based on some
> external auth mechanism (such as Kube LDAP authentication).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]