[ 
https://issues.apache.org/jira/browse/HDFS-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036480#comment-18036480
 ] 

ASF GitHub Bot commented on HDFS-17849:
---------------------------------------

surendralilhore commented on code in PR #8054:
URL: https://github.com/apache/hadoop/pull/8054#discussion_r2506337576


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java:
##########
@@ -858,7 +858,11 @@ private void removeExpiredToken() throws IOException {
         long renewDate = entry.getValue().getRenewDate();
         if (renewDate < now) {
           expiredTokens.add(entry.getKey());
-          removeTokenForOwnerStats(entry.getKey());
+          try {
+            removeTokenForOwnerStats(entry.getKey());

Review Comment:
   @arunreddyav, I have a question, similar to what @Hexiaoqiao mentioned: how 
are we handling the cleanup of tokenOwnerStats in exception cases? Could you 
check if the following idea makes sense?
   We need to ensure that tokenOwnerStats is cleaned up when an exception 
occurs. To do this, we should try to obtain the real user from the token 
identity, like so:
   
   ```
             try {
               removeTokenForOwnerStats(entry.getKey());
             } catch (IllegalArgumentException e) {
               
**removeTokenForOwnerStats(entry.getKey().getRealUser().toString());**
               LOG.warn("Ignoring the exception in removeTokenForOwnerStats to 
remove expired " +
                       "delegation tokens from cache and proceeding to remove", 
e);
             }
   ```
   
   Let's introduce a new `removeTokenForOwnerStats(String)` method, for example:
   
   ```
     private void removeTokenForOwnerStats(TokenIdent id) {
       String realOwner = getTokenRealOwner(id);
       removeTokenForOwnerStats(realOwner);
     }
   
     private void removeTokenForOwnerStats(String realOwner) {
       if (tokenOwnerStats.containsKey(realOwner)) {
         // unlikely to be less than 1 but in case
         if (tokenOwnerStats.get(realOwner) <= 1) {
           tokenOwnerStats.remove(realOwner);
         } else {
           tokenOwnerStats.put(realOwner, tokenOwnerStats.get(realOwner)-1);
         }
       }
     }
   ```
   
   





> Namenode crashed while cleaning up Expired Delegation tokens
> ------------------------------------------------------------
>
>                 Key: HDFS-17849
>                 URL: https://issues.apache.org/jira/browse/HDFS-17849
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.4.1
>            Reporter: Kanaka Kumar Avvaru
>            Priority: Major
>              Labels: pull-request-available
>
> We are facing NN crashed issue during token cleanup after updating the kerb 
> auth rules to pickup new realm configuration from existing one.
>  
> Here is the stack trace
> {noformat}
> 2025-08-11 02:28:06,448 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(856)) - ExpiredTokenRemover 
> thread received unexpected exception
> java.lang.IllegalArgumentException: Illegal principal name 
> spark/<hostname>@<old_realm>: 
> org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
> No rules applied to spark/<hostname>@<old_realm>
>         at org.apache.hadoop.security.User.<init>(User.java:51)
>         at org.apache.hadoop.security.User.<init>(User.java:43)
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1458)
>         at 
> org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1441)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.getUser(AbstractDelegationTokenIdentifier.java:80)
>         at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier.getUser(DelegationTokenIdentifier.java:81)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.getTokenRealOwner(AbstractDelegationTokenSecretManager.java:914)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeTokenForOwnerStats(AbstractDelegationTokenSecretManager.java:936)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:773)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:71)
>         at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:846)
>         at java.lang.Thread.run(Thread.java:750)
> Caused by: 
> org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: 
> No rules applied to spark/<hostname>@<old_realm>
>         at 
> org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:429)
>         at org.apache.hadoop.security.User.<init>(User.java:48)
>         ... 11 more
> 2025-08-11 02:28:06,450 INFO  provider.AuditProviderFactory 
> (AuditProviderFactory.java:run(537)) - ==> JVMShutdownHook.run(){noformat}
>  
>  HDFS-17138 attempted to avoid crash during token logging but 
> getTokenRealOwner to update the token owner stats failing now in 3.4.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to