[
https://issues.apache.org/jira/browse/HADOOP-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611307#comment-17611307
]
Clay B. commented on HADOOP-16298:
----------------------------------
*History:*
* This work began in 2019
* This work as is requires manually renewing delegation tokens has been ported
to Hadoop 3.x; some unit-tests written as in the PR from March 9th.
* I have had teams run this code a fair amount in a Kubernetes environment
running Flink jobs as a long-running HBase 1.x client without issue
* The current code requires invasive changes to all Hadoop client frameworks –
to hook into authentication failures and reload credentials on-demand. *This
has proved a terrible support experience.*
* Further, the actual design of having client frameworks hook into
authentication failures does not meaningfully support threading models like
HBase 2.x's. *Proactive refresh of tokens is needed.*
* To @[email protected]'s awesome feedback that was never addressed:
*
** Steve thank you so much for +Hadoop and Kerberos: The Madness beyond the
Gate+ I've loved learning from it! I see your proposal of "Client-side push of
renewed Delegation Tokens" there. While I've seen some well orchestrated
operations and Hadoop application development teams, for our use-case I would
expect we need to stick close to "standard"/vendor Hadoop application design
patterns.
To me, running a small server in a client would be a challenge for services
deployed on Kubernetes clusters as we would need the injector to run in the
namespace or setup ingest rules for this. Further, for the use-cases I see
written by hundreds of application teams largely infrastructure ignorant I
appreciate if I can abstract them away from knowing about authentication and
can rely on relatively standard Hadoop documentation and training vs a more
custom injection process. (Currently in the deployment model we have been
using, keytabs and any long-term authentication credential is separated out
from application team control and outside the K8s namespace.)
Lastly, the client injecting credentials seems a lot like it would require the
same UGI changes in the end as the client re-reading credentials off disk?
** Pankaj when he started this work looked at {{maxDate}} but see my comments
about HBase (which is our main use-case) below. I think this can be reconciled
but seems like that should be a later addition. (E.g. try to keep the changes
as small and deterministic as possible first.)
** While HBase, Hive (I think it only has WebHCat tokens) are something I can
test with. I'm a bit out of my depth with KMS, S3A and ABFS tokens yet. Thank
you for those pointers and maybe you have some ideas on unit-tests I could
extend to verify them?
I do not understand how SPNEGO auth would here be involved? Here the intention
is that the client is started and run solely with Delegation Tokens and never
does any Kerberos exchanges. Are you thinking if one was doing RESTful
operations from the application to a SPNEGO authenticated endpoint? (If so,
could you offer an example you would envision?)
*Next Steps:*
_To no longer require changes in client frameworks, I look to implement a
proactive refresh as is today done for Kerberos tickets_
* I am now looking to craft a renewal thread process in UGI under
{{spawnAutoRenewalThreadForUserCreds}} following the pattern there for Kerberos
to proactively refresh tokens. This proactive refresh would address the need
for Hadoop client frameworks needing to be changed at all.
* The renewal thread approach will require only one API call in application
client code (versus client frameworks) to use the feature. Nicely, it would be
done as the same UGI entry points required for long-running Kerberos clients.
** I will implement the thread running on a fixed time vs querying token
expiration at first.
** A challenge for my use-case to support automatically determining refresh
timing is that HBase delegation tokens extend {{TokenIdentifier}} and has the
field {{{}expirationDate{}}}. Meanwhile, Hadoop delegation tokens extend
AbstractDelegationTokenIdentifier which offers a {{maxDate}} field
> Manage/Renew delegation tokens for externally scheduled jobs
> ------------------------------------------------------------
>
> Key: HADOOP-16298
> URL: https://issues.apache.org/jira/browse/HADOOP-16298
> Project: Hadoop Common
> Issue Type: Improvement
> Components: security
> Affects Versions: 3.3.0
> Reporter: Pankaj Deshpande
> Assignee: Clay B.
> Priority: Major
> Labels: pull-request-available
> Attachments: Proposal for changes to UGI for managing_renewing
> externally managed delegation tokens.pdf
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> * Presently when jobs are run in the Hadoop ecosystem, the implicit
> assumption is that YARN will be used as a scheduling agent with access to
> appropriate keytabs for renewal of kerberos tickets and delegation tokens.
> * Jobs that interact with kerberized hadoop services such as hbase/hive/hdfs
> and use an external scheduler such as Kubernetes, typically do not have
> access to keytabs. In such cases, delegation tokens are a logical choice for
> interacting with a kerberized cluster. These tokens are issued based on some
> external auth mechanism (such as Kube LDAP authentication).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]