Wei-Chiu Chuang created HDDS-13234:
--------------------------------------
Summary: Expired delegation tokens can abort leader OM startup
Key: HDDS-13234
URL: https://issues.apache.org/jira/browse/HDDS-13234
Project: Apache Ozone
Issue Type: Bug
Reporter: Wei-Chiu Chuang
Found a bug where expired delegation tokens can abort leader OM startup.
First, Leader OM crashed due to RATIS-1873.
And then, leader OM tried to start but failed:
{noformat}
2025-06-06 06:59:44,499 ERROR
[main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
exception
java.lang.NullPointerException
at
org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.addPersistedDelegationToken(OzoneDelegationTokenSecretManager.java:575)
at
org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.loadTokenSecretState(OzoneDelegationTokenSecretManager.java:560)
at
org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager.<init>(OzoneDelegationTokenSecretManager.java:112)
at
org.apache.hadoop.ozone.security.OzoneDelegationTokenSecretManager$Builder.build(OzoneDelegationTokenSecretManager.java:131)
at
org.apache.hadoop.ozone.om.OzoneManager.createDelegationTokenSecretManager(OzoneManager.java:1055)
at
org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:831)
at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:674)
at
org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:759)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:103)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:94)
at
org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
2025-06-06 06:59:44,503 INFO
[shutdown-hook-0]-org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:
{noformat}
What happened was:
1. OM loads delegation tokens when startup.
2. When loading delegation tokens, OM sends an inquiry to SCM asking for the
secret keys associated with the delegation tokens.
3. If the secret key already expires, SCM removed it and OM's request returns a
null, which results in a NullPointerException that aborts OM.
What we should do:
1. if secret key expires, ignore the delegation token, so that OM startup can
proceed.
2. OM delegation token secret manager will remove the dt later because it
expires.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]