[ 
https://issues.apache.org/jira/browse/HDDS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-13234:
-----------------------------------
    Summary: Expired secret key can abort leader OM startup  (was: Expired 
delegation tokens can abort leader OM startup)

> Expired secret key can abort leader OM startup
> ----------------------------------------------
>
>                 Key: HDDS-13234
>                 URL: https://issues.apache.org/jira/browse/HDDS-13234
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>
> Found a bug where expired delegation tokens can abort leader OM startup.
> First, Leader OM crashed due to RATIS-1873.
> And then, leader OM tried to start but failed:
> {noformat}
> 2025-06-06 06:59:44,499 ERROR 
> [main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with 
> exception
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.addPersistedDelegationToken(OzoneDelegationTokenSecretManager.java:575)
>         at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.loadTokenSecretState(OzoneDelegationTokenSecretManager.java:560)
>         at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.<init>(OzoneDelegationTokenSecretManager.java:112)
>         at 
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager$Builder.build(OzoneDelegationTokenSecretManager.java:131)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.createDelegationTokenSecretManager(OzoneManager.java:1055)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:831)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:674)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:759)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
>         at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
>         at picocli.CommandLine.access$1300(CommandLine.java:145)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
>         at picocli.CommandLine.execute(CommandLine.java:2078)
>         at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:103)
>         at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:94)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
> 2025-06-06 06:59:44,503 INFO 
> [shutdown-hook-0]-org.apache.hadoop.ozone.om.OzoneManagerStarter: 
> SHUTDOWN_MSG: {noformat}
> What happened was:
> 1. OM loads delegation tokens when startup.
> 2. When loading delegation tokens, OM sends an inquiry to SCM asking for the 
> secret keys associated with the delegation tokens.
> 3. If the secret key already expires, SCM removed it and OM's request returns 
> a null, which results in a NullPointerException that aborts OM.
> What we should do:
> 1. if secret key expires, ignore the delegation token, so that OM startup can 
> proceed.
> 2. OM delegation token secret manager will remove the dt later because it 
> expires.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to