Pratyush Bhatt created HDDS-10626:
-------------------------------------
Summary: [LeaseRecovery] OM shuts down with "SecretKey client must
have been initialized already"
Key: HDDS-10626
URL: https://issues.apache.org/jira/browse/HDDS-10626
Project: Apache Ozone
Issue Type: Bug
Components: OM
Reporter: Pratyush Bhatt
In a scenario where I'm conducting lease recovery on multiple files during a
rolling restart, the OM encounters failure subsequent to the restart of Ozone
Managers (OMs).
{code:java}
2024-03-31 09:47:01,866 ERROR [om72-OMStateMachineApplyTransactionThread -
0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with
exit status 1: Request cmdType: RecoverLease
traceID: ""
clientId: "client-433C04E5C8CC"
userInfo {
userName: "[email protected]"
remoteAddress: "10.64.62.57"
hostName: "vb1307.halxg.cloudera.com"
}
version: 3
layoutVersion {
version: 6
}
RecoverLeaseRequest {
volumeName: "hsyncvol"
bucketName: "hsyncbuck"
keyName: "hsync/File_24.txt"
force: false
}
failed with exception
java.lang.NullPointerException: SecretKey client must have been initialized
already.
at java.util.Objects.requireNonNull(Objects.java:228)
at
org.apache.hadoop.hdds.security.symmetric.DefaultSecretKeySignerClient.getCurrentSecretKey(DefaultSecretKeySignerClient.java:70)
at
org.apache.hadoop.hdds.security.token.ShortLivedTokenSecretManager.createPassword(ShortLivedTokenSecretManager.java:47)
at
org.apache.hadoop.hdds.security.token.OzoneBlockTokenSecretManager.generateToken(OzoneBlockTokenSecretManager.java:70)
at
org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.updateBlockInfo(OMRecoverLeaseRequest.java:281)
at
org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.doWork(OMRecoverLeaseRequest.java:264)
at
org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.validateAndUpdateCache(OMRecoverLeaseRequest.java:156)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
at
org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
at
org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}
Have seen this 2-3 times, and this time I was able to repro it when Lease
recovery is happening during RR phase.
cc: [~ashishk] [~weichiu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]