[
https://issues.apache.org/jira/browse/HDDS-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated HDDS-13234:
------------------------------------
Fix Version/s: 2.1.0
> Expired secret key can abort leader OM startup
> ----------------------------------------------
>
> Key: HDDS-13234
> URL: https://issues.apache.org/jira/browse/HDDS-13234
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Critical
> Labels: pull-request-available
> Fix For: 2.1.0
>
>
> Found a bug where expired secret key can abort leader OM startup.
> First, Leader OM crashed due to RATIS-1873.
> And then, leader OM tried to start but failed:
> {noformat}
> 2025-06-06 06:59:44,499 ERROR
> [main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
> exception
> java.lang.NullPointerException
> at
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.addPersistedDelegationToken(OzoneDelegationTokenSecretManager.java:575)
> at
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.loadTokenSecretState(OzoneDelegationTokenSecretManager.java:560)
> at
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.<init>(OzoneDelegationTokenSecretManager.java:112)
> at
> org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager$Builder.build(OzoneDelegationTokenSecretManager.java:131)
> at
> org.apache.hadoop.ozone.om.OzoneManager.createDelegationTokenSecretManager(OzoneManager.java:1055)
> at
> org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:831)
> at
> org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:674)
> at
> org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:759)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
> at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
> at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
> at picocli.CommandLine.access$1300(CommandLine.java:145)
> at
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
> at
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
> at picocli.CommandLine.execute(CommandLine.java:2078)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:103)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:94)
> at
> org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
> 2025-06-06 06:59:44,503 INFO
> [shutdown-hook-0]-org.apache.hadoop.ozone.om.OzoneManagerStarter:
> SHUTDOWN_MSG: {noformat}
> What happened was:
> 1. OM loads delegation tokens when startup.
> 2. When loading delegation tokens, OM sends an inquiry to SCM asking for the
> secret keys associated with the delegation tokens.
> 3. If the secret key already expires, SCM removed it and OM's request returns
> a null, which results in a NullPointerException that aborts OM.
> What we should do:
> 1. if secret key expires, ignore the delegation token, so that OM startup can
> proceed.
> 2. OM delegation token secret manager will remove the dt later because it
> expires.
> This bug appears to be a regression caused by HDDS-8829 in the corner case
> described above.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]