[ 
https://issues.apache.org/jira/browse/HDFS-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035298#comment-18035298
 ] 

Kanaka Kumar Avvaru commented on HDFS-17849:
--------------------------------------------

Arun attempted the fix in this PR  [https://github.com/apache/hadoop/pull/8054] 
 and verified in our 3.4.1 cluster. Can you review the PR [~zhangxiping] 
[~hexiaoqiao]  [~goiri] .

Note on the metrics  per owner we did not assess the impact as we could not see 
any JMX entries for delegation token counts in NN jmx metrics; If you see any 
impact pls specify how t oobserve. 

> Namenode crashed while cleaning up Expired Delegation tokens
> ------------------------------------------------------------
>
>                 Key: HDFS-17849
>                 URL: https://issues.apache.org/jira/browse/HDFS-17849
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.4.1
>            Reporter: Kanaka Kumar Avvaru
>            Priority: Major
>
> We are facing NN crashed issue during token cleanup after updating the kerb 
> auth rules to pickup new realm configuration from existing one.
>  
> Here is the stack trace
> {noformat}
> 2025-08-11 02:28:06,448 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(856)) - ExpiredTokenRemover 
> thread received unexpected exception
> java.lang.IllegalArgumentException: Illegal principal name 
> spark/<hostname>@<old_realm>: 
> org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
> No rules applied to spark/<hostname>@<old_realm>
>         at org.apache.hadoop.security.User.<init>(User.java:51)
>         at org.apache.hadoop.security.User.<init>(User.java:43)
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1458)
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1441)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.getUser(AbstractDelegationTokenIdentifier.java:80)
>         at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier.getUser(DelegationTokenIdentifier.java:81)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.getTokenRealOwner(AbstractDelegationTokenSecretManager.java:914)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeTokenForOwnerStats(AbstractDelegationTokenSecretManager.java:936)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:773)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:71)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:846)
>         at java.lang.Thread.run(Thread.java:750)
> Caused by: 
> org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
> No rules applied to spark/<hostname>@<old_realm>
>         at 
> org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:429)
>         at org.apache.hadoop.security.User.<init>(User.java:48)
>         ... 11 more
> 2025-08-11 02:28:06,450 INFO  provider.AuditProviderFactory 
> (AuditProviderFactory.java:run(537)) - ==> JVMShutdownHook.run(){noformat}
>  
>  HDFS-17138 attempted to avoid crash during token logging but 
> getTokenRealOwner to update the token owner stats failing now in 3.4.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to