Pratyush Bhatt created HDDS-10626:
-------------------------------------

             Summary: [LeaseRecovery] OM shuts down with "SecretKey client must 
have been initialized already"
                 Key: HDDS-10626
                 URL: https://issues.apache.org/jira/browse/HDDS-10626
             Project: Apache Ozone
          Issue Type: Bug
          Components: OM
            Reporter: Pratyush Bhatt


In a scenario where I'm conducting lease recovery on multiple files during a 
rolling restart, the OM encounters failure subsequent to the restart of Ozone 
Managers (OMs). 
{code:java}
2024-03-31 09:47:01,866 ERROR [om72-OMStateMachineApplyTransactionThread - 
0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with 
exit status 1: Request cmdType: RecoverLease
traceID: ""
clientId: "client-433C04E5C8CC"
userInfo {
  userName: "[email protected]"
  remoteAddress: "10.64.62.57"
  hostName: "vb1307.halxg.cloudera.com"
}
version: 3
layoutVersion {
  version: 6
}
RecoverLeaseRequest {
  volumeName: "hsyncvol"
  bucketName: "hsyncbuck"
  keyName: "hsync/File_24.txt"
  force: false
}
 failed with exception
java.lang.NullPointerException: SecretKey client must have been initialized 
already.
        at java.util.Objects.requireNonNull(Objects.java:228)
        at 
org.apache.hadoop.hdds.security.symmetric.DefaultSecretKeySignerClient.getCurrentSecretKey(DefaultSecretKeySignerClient.java:70)
        at 
org.apache.hadoop.hdds.security.token.ShortLivedTokenSecretManager.createPassword(ShortLivedTokenSecretManager.java:47)
        at 
org.apache.hadoop.hdds.security.token.OzoneBlockTokenSecretManager.generateToken(OzoneBlockTokenSecretManager.java:70)
        at 
org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.updateBlockInfo(OMRecoverLeaseRequest.java:281)
        at 
org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.doWork(OMRecoverLeaseRequest.java:264)
        at 
org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.validateAndUpdateCache(OMRecoverLeaseRequest.java:156)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
        at 
org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
        at 
org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748) {code}
Have seen this 2-3 times, and this time I was able to repro it when Lease 
recovery is happening during RR phase.

cc: [~ashishk] [~weichiu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to