[
https://issues.apache.org/jira/browse/FLINK-28291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jiulong.zhu updated FLINK-28291:
--------------------------------
Attachment: FLINK-28291.0001.patch
> Add kerberos delegation token renewer feature instead of logged from keytab
> individually
> ----------------------------------------------------------------------------------------
>
> Key: FLINK-28291
> URL: https://issues.apache.org/jira/browse/FLINK-28291
> Project: Flink
> Issue Type: New Feature
> Components: Deployment / YARN
> Affects Versions: 1.13.5
> Reporter: jiulong.zhu
> Priority: Minor
> Fix For: 1.13.5
>
> Attachments: FLINK-28291.0001.patch
>
>
> h2. 1. Design
> LifeCycle of delegation token in RM:
> # Container starts with DT given by client.
> # Enable delegation token renewer by:
> ## set {{security.kerberos.token.renew.enabled}} true, default false. And
> ## specify {{security.kerberos.login.keytab}} and
> {{security.kerberos.login.principal}}
> # When enabled delegation token renewer, the renewer thread will re-obtain
> tokens from DelegationTokenProvider(only HadoopFSDelegationTokenProvider
> now). Then the renewer thread will broadcast new tokens to RM locally, all
> JMs and all TMs by RPCGateway.
> # RM process adds new tokens in context by UserGroupInformation.
> LifeCycle of delegation token in JM / TM:
> # TaskManager starts with keytab stored in remote hdfs.
> # When registered successfully, JM / TM get the current tokens of RM boxed
> by {{JobMasterRegistrationSuccess}} / {{{}TaskExecutorRegistrationSuccess{}}}.
> # JM / TM process add new tokens in context by UserGroupInformation.
> It’s too heavy and unnecessary to retrieval leader of ResourceManager by
> HAService, so DelegationTokenManager is instanced by ResourceManager. So
> DelegationToken can hold the reference of ResourceManager, instead of RM
> RPCGateway or self gateway.
> h2. 2. Test
> # No local junit test. It’s too heavy to build junit environments including
> KDC and local hadoop.
> # Cluster test
> step 1: Specify krb5.conf with short token lifetime(ticket_lifetime,
> renew_lifetime) when submitting flink application.
> ```
> {{flink run .... -yD security.kerberos.token.renew.enabled=true -yD
> security.kerberos.krb5-conf.path= /home/work/krb5.conf -yD
> security.kerberos.login.use-ticket-cache=false ...}}
> ```
> step 2: Watch token identifier changelog and synchronizer between rm and
> worker.
> >>
> In RM / JM log,
> 2022-06-28 15:13:03,509 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> New token (HDFS_DELEGATION_TOKEN token 52101 for work on ha-hdfs:newfyyy)
> created in KerberosDelegationToken, and next schedule delay is 64799880 ms.
> 2022-06-28 15:13:03,529 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> Updating delegation tokens for current user. 2022-06-28 15:13:04,729 INFO
> org.apache.flink.runtime.util.HadoopUtils [] - JobMaster receives new token
> (HDFS_DELEGATION_TOKEN token 52101 for work on ha-hdfs:newfyyy) from RM.
> …
> 2022-06-29 09:13:03,732 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> New token (HDFS_DELEGATION_TOKEN token 52310 for work on ha-hdfs:newfyyy)
> created in KerberosDelegationToken, and next schedule delay is 64800045 ms.
> 2022-06-29 09:13:03,805 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> Updating delegation tokens for current user.
> 2022-06-29 09:13:03,806 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> JobMaster receives new token (HDFS_DELEGATION_TOKEN token 52310 for work on
> ha-hdfs:newfyyy) from RM.
> >>
> In TM log,
> 2022-06-28 15:13:17,983 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> TaskManager receives new token (HDFS_DELEGATION_TOKEN token 52101 for work on
> ha-hdfs:newfyyy) from RM.
> 2022-06-28 15:13:18,016 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> Updating delegation tokens for current user.
> …
> 2022-06-29 09:13:03,809 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> TaskManager receives new token (HDFS_DELEGATION_TOKEN token 52310 for work on
> ha-hdfs:newfyyy) from RM.
> 2022-06-29 09:13:03,836 INFO org.apache.flink.runtime.util.HadoopUtils [] -
> Updating delegation tokens for current user.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)