[
https://issues.apache.org/jira/browse/FLINK-39274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ruiliang updated FLINK-39274:
-----------------------------
Description:
>From the document, it can be seen that the allocation did not distinguish
>between JM and TM.
flink-conf.yaml
{code:java}
security.kerberos.login.keytab=xx.keytab
security.kerberos.login.principal=xx_principal{code}
launch_container.sh
{code:java}
# It is clearly evident here that AM has successfully issued the TOKEN.
export
HADOOP_TOKEN_FILE_LOCATION="/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/container_tokens"
..
# But keytab files will still be downloaded here.
export
_REMOTE_KEYTAB_PATH="hdfs://xx/user/hiidoagent/.flink/application_1773803886076_15646/hiidoagent.keytab"
export HADOOP_USER_NAME="[email protected]"
export _LOCAL_KEYTAB_PATH="krb5.keytab"
export _KEYTAB_PRINCIPAL="hiidoagent"{code}
TM log
{code:java}
2026-03-18 17:49:23,394 INFO
org.apache.flink.runtime.state.changelog.StateChangelogStorageLoader [] -
StateChangelogStorageLoader initialized with shortcut names {memory,filesystem}.
2026-03-18 17:49:23,441 INFO
org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] -
Attempting to login to KDC using principal: hiidoagent keytab:
/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
2026-03-18 17:49:23,717 INFO org.apache.hadoop.security.UserGroupInformation
[] - Login successful for user hiidoagent using keytab file
/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
2026-03-18 17:49:23,717 INFO
org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] -
Successfully logged into KDC
2026-03-18 17:49:23,719 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Starting TGT
renewal task
2026-03-18 17:49:23,719 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - TGT renewal
task started and reoccur in 60000 ms
2026-03-18 17:49:23,719 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Hadoop user
set to [email protected] (auth:KERBEROS)
2026-03-18 17:49:23,720 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Kerberos
security is enabled.
2026-03-18 17:49:23,720 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Kerberos
credentials are valid.
2026-03-18 17:49:23,726 INFO
org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file
will be created as
/data1/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/jaas-7581660068545285667.conf.
...
2026-03-18 17:49:25,228 INFO
org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled
external resources: []
2026-03-18 17:49:25,229 INFO
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] -
Loading delegation token receivers
2026-03-18 17:49:25,232 INFO
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] -
Delegation token receiver hadoopfs loaded and initialized
2026-03-18 17:49:25,233 INFO
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] -
Delegation token receiver hbase loaded and initialized {code}
代码:
[https://github.com/apache/flink/blob/6fc5c97ec3a89975ee44b1b084efc8fbc25c73ee/flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskExecutorRunner.java#L132]
Looking at the source code, there is no configuration or judgment logic in the
code. Here, we should configure controllability instead of writing it
completely in a fixed manner.
KDC
The concurrent volume of KDC = number of Flink apps * total number of
containers.
If it involves a large number of short-term Flink tasks, this will be a fatal
pressure on KDC. KDC will become severely sluggish and affect the overall
security and stability of the cluster.
was:
>From the document, it can be seen that the allocation did not distinguish
>between AM and TM.
flink-conf.yaml
{code:java}
security.kerberos.login.keytab=xx.keytab
security.kerberos.login.principal=xx_principal{code}
launch_container.sh
{code:java}
# It is clearly evident here that AM has successfully issued the TOKEN.
export
HADOOP_TOKEN_FILE_LOCATION="/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/container_tokens"
..
# But keytab files will still be downloaded here.
export
_REMOTE_KEYTAB_PATH="hdfs://xx/user/hiidoagent/.flink/application_1773803886076_15646/hiidoagent.keytab"
export HADOOP_USER_NAME="[email protected]"
export _LOCAL_KEYTAB_PATH="krb5.keytab"
export _KEYTAB_PRINCIPAL="hiidoagent"{code}
TM log
{code:java}
2026-03-18 17:49:23,394 INFO
org.apache.flink.runtime.state.changelog.StateChangelogStorageLoader [] -
StateChangelogStorageLoader initialized with shortcut names {memory,filesystem}.
2026-03-18 17:49:23,441 INFO
org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] -
Attempting to login to KDC using principal: hiidoagent keytab:
/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
2026-03-18 17:49:23,717 INFO org.apache.hadoop.security.UserGroupInformation
[] - Login successful for user hiidoagent using keytab file
/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
2026-03-18 17:49:23,717 INFO
org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] -
Successfully logged into KDC
2026-03-18 17:49:23,719 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Starting TGT
renewal task
2026-03-18 17:49:23,719 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - TGT renewal
task started and reoccur in 60000 ms
2026-03-18 17:49:23,719 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Hadoop user
set to [email protected] (auth:KERBEROS)
2026-03-18 17:49:23,720 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Kerberos
security is enabled.
2026-03-18 17:49:23,720 INFO
org.apache.flink.runtime.security.modules.HadoopModule [] - Kerberos
credentials are valid.
2026-03-18 17:49:23,726 INFO
org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file
will be created as
/data1/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/jaas-7581660068545285667.conf.
...
2026-03-18 17:49:25,228 INFO
org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled
external resources: []
2026-03-18 17:49:25,229 INFO
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] -
Loading delegation token receivers
2026-03-18 17:49:25,232 INFO
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] -
Delegation token receiver hadoopfs loaded and initialized
2026-03-18 17:49:25,233 INFO
org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository [] -
Delegation token receiver hbase loaded and initialized {code}
代码:
[https://github.com/apache/flink/blob/6fc5c97ec3a89975ee44b1b084efc8fbc25c73ee/flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskExecutorRunner.java#L132]
Looking at the source code, there is no configuration or judgment logic in the
code. Here, we should configure controllability instead of writing it
completely in a fixed manner.
KDC
The concurrent volume of KDC = number of Flink apps * total number of
containers.
If it involves a large number of short-term Flink tasks, this will be a fatal
pressure on KDC. KDC will become severely sluggish and affect the overall
security and stability of the cluster.
> TM It is impossible to bypass the KDC login process, yet the TOKEN issued by
> AM has not been actually utilized.
> ---------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39274
> URL: https://issues.apache.org/jira/browse/FLINK-39274
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.17.2
> Environment: flink on yarn
> Reporter: ruiliang
> Priority: Major
>
> From the document, it can be seen that the allocation did not distinguish
> between JM and TM.
> flink-conf.yaml
> {code:java}
> security.kerberos.login.keytab=xx.keytab
> security.kerberos.login.principal=xx_principal{code}
> launch_container.sh
> {code:java}
> # It is clearly evident here that AM has successfully issued the TOKEN.
> export
> HADOOP_TOKEN_FILE_LOCATION="/data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/container_tokens"
> ..
> # But keytab files will still be downloaded here.
> export
> _REMOTE_KEYTAB_PATH="hdfs://xx/user/hiidoagent/.flink/application_1773803886076_15646/hiidoagent.keytab"
> export HADOOP_USER_NAME="[email protected]"
> export _LOCAL_KEYTAB_PATH="krb5.keytab"
> export _KEYTAB_PRINCIPAL="hiidoagent"{code}
> TM log
> {code:java}
> 2026-03-18 17:49:23,394 INFO
> org.apache.flink.runtime.state.changelog.StateChangelogStorageLoader [] -
> StateChangelogStorageLoader initialized with shortcut names
> {memory,filesystem}.
> 2026-03-18 17:49:23,441 INFO
> org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] -
> Attempting to login to KDC using principal: hiidoagent keytab:
> /data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
> 2026-03-18 17:49:23,717 INFO org.apache.hadoop.security.UserGroupInformation
> [] - Login successful for user hiidoagent using keytab file
> /data2/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/container_e268_1773803886076_15646_01_000003/krb5.keytab
> 2026-03-18 17:49:23,717 INFO
> org.apache.flink.runtime.security.token.hadoop.KerberosLoginProvider [] -
> Successfully logged into KDC
> 2026-03-18 17:49:23,719 INFO
> org.apache.flink.runtime.security.modules.HadoopModule [] - Starting
> TGT renewal task
> 2026-03-18 17:49:23,719 INFO
> org.apache.flink.runtime.security.modules.HadoopModule [] - TGT renewal
> task started and reoccur in 60000 ms
> 2026-03-18 17:49:23,719 INFO
> org.apache.flink.runtime.security.modules.HadoopModule [] - Hadoop user
> set to [email protected] (auth:KERBEROS)
> 2026-03-18 17:49:23,720 INFO
> org.apache.flink.runtime.security.modules.HadoopModule [] - Kerberos
> security is enabled.
> 2026-03-18 17:49:23,720 INFO
> org.apache.flink.runtime.security.modules.HadoopModule [] - Kerberos
> credentials are valid.
> 2026-03-18 17:49:23,726 INFO
> org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file
> will be created as
> /data1/hadoop/yarn/local/usercache/hiidoagent/appcache/application_1773803886076_15646/jaas-7581660068545285667.conf.
> ...
> 2026-03-18 17:49:25,228 INFO
> org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled
> external resources: []
> 2026-03-18 17:49:25,229 INFO
> org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository []
> - Loading delegation token receivers
> 2026-03-18 17:49:25,232 INFO
> org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository []
> - Delegation token receiver hadoopfs loaded and initialized
> 2026-03-18 17:49:25,233 INFO
> org.apache.flink.runtime.security.token.DelegationTokenReceiverRepository []
> - Delegation token receiver hbase loaded and initialized {code}
> 代码:
> [https://github.com/apache/flink/blob/6fc5c97ec3a89975ee44b1b084efc8fbc25c73ee/flink-yarn/src/main/java/org/apache/flink/yarn/YarnTaskExecutorRunner.java#L132]
> Looking at the source code, there is no configuration or judgment logic in
> the code. Here, we should configure controllability instead of writing it
> completely in a fixed manner.
> KDC
> The concurrent volume of KDC = number of Flink apps * total number of
> containers.
> If it involves a large number of short-term Flink tasks, this will be a fatal
> pressure on KDC. KDC will become severely sluggish and affect the overall
> security and stability of the cluster.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)