[ 
https://issues.apache.org/jira/browse/HADOOP-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611307#comment-17611307
 ] 

Clay B. commented on HADOOP-16298:
----------------------------------

*History:*
 * This work began in 2019
 * This work as is requires manually renewing delegation tokens has been ported 
to Hadoop 3.x; some unit-tests written as in the PR from March 9th.
 * I have had teams run this code a fair amount in a Kubernetes environment 
running Flink jobs as a long-running HBase 1.x client without issue
 * The current code requires invasive changes to all Hadoop client frameworks – 
to hook into authentication failures and reload credentials on-demand. *This 
has proved a terrible support experience.*
 * Further, the actual design of having client frameworks hook into 
authentication failures does not meaningfully support threading models like 
HBase 2.x's. *Proactive refresh of tokens is needed.*
 * To @[email protected]'s awesome feedback that was never addressed:

 * 
 ** Steve thank you so much for +Hadoop and Kerberos: The Madness beyond the 
Gate+ I've loved learning from it! I see your proposal of "Client-side push of 
renewed Delegation Tokens" there. While I've seen some well orchestrated 
operations and Hadoop application development teams, for our use-case I would 
expect we need to stick close to "standard"/vendor Hadoop application design 
patterns.
To me, running a small server in a client would be a challenge for services 
deployed on Kubernetes clusters as we would need the injector to run in the 
namespace or setup ingest rules for this. Further, for the use-cases I see 
written by hundreds of application teams largely infrastructure ignorant I 
appreciate if I can abstract them away from knowing about authentication and 
can rely on relatively standard Hadoop documentation and training vs a more 
custom injection process. (Currently in the deployment model we have been 
using, keytabs and any long-term authentication credential is separated out 
from application team control and outside the K8s namespace.)
Lastly, the client injecting credentials seems a lot like it would require the 
same UGI changes in the end as the client re-reading credentials off disk?
 ** Pankaj when he started this work looked at {{maxDate}} but see my comments 
about HBase (which is our main use-case) below. I think this can be reconciled 
but seems like that should be a later addition. (E.g. try to keep the changes 
as small and deterministic as possible first.)
 ** While HBase, Hive (I think it only has WebHCat tokens) are something I can 
test with. I'm a bit out of my depth with KMS, S3A and ABFS tokens yet. Thank 
you for those pointers and maybe you have some ideas on unit-tests I could 
extend to verify them?
I do not understand how SPNEGO auth would here be involved? Here the intention 
is that the client is started and run solely with Delegation Tokens and never 
does any Kerberos exchanges. Are you thinking if one was doing RESTful 
operations from the application to a SPNEGO authenticated endpoint? (If so, 
could you offer an example you would envision?)

*Next Steps:*
_To no longer require changes in client frameworks, I look to implement a 
proactive refresh as is today done for Kerberos tickets_
 * I am now looking to craft a renewal thread process in UGI under 
{{spawnAutoRenewalThreadForUserCreds}} following the pattern there for Kerberos 
to proactively refresh tokens. This proactive refresh would address the need 
for Hadoop client frameworks needing to be changed at all.
 * The renewal thread approach will require only one API call in application 
client code (versus client frameworks) to use the feature. Nicely, it would be 
done as the same UGI entry points required for long-running Kerberos clients.
 ** I will implement the thread running on a fixed time vs querying token 
expiration at first.
 ** A challenge for my use-case to support automatically determining refresh 
timing is that HBase delegation tokens extend {{TokenIdentifier}} and has the 
field {{{}expirationDate{}}}. Meanwhile, Hadoop delegation tokens extend 
AbstractDelegationTokenIdentifier which offers a {{maxDate}} field

> Manage/Renew delegation tokens for externally scheduled jobs
> ------------------------------------------------------------
>
>                 Key: HADOOP-16298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16298
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 3.3.0
>            Reporter: Pankaj Deshpande
>            Assignee: Clay B.
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Proposal for changes to UGI for managing_renewing 
> externally managed delegation tokens.pdf
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Presently when jobs are run in the Hadoop ecosystem, the implicit 
> assumption is that YARN will be used as a scheduling agent with access to 
> appropriate keytabs for renewal of kerberos tickets and delegation tokens. 
>  * Jobs that interact with kerberized hadoop services such as hbase/hive/hdfs 
> and use an external scheduler such as Kubernetes, typically do not have 
> access to keytabs. In such cases, delegation tokens are a logical choice for 
> interacting with a kerberized cluster. These tokens are issued based on some 
> external auth mechanism (such as Kube LDAP authentication).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to