[
https://issues.apache.org/jira/browse/HDDS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang resolved HDDS-10626.
------------------------------------
Fix Version/s: HDDS-7593
Resolution: Fixed
> [LeaseRecovery] OM shuts down with "SecretKey client must have been
> initialized already"
> ----------------------------------------------------------------------------------------
>
> Key: HDDS-10626
> URL: https://issues.apache.org/jira/browse/HDDS-10626
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: OM
> Reporter: Pratyush Bhatt
> Assignee: Sammi Chen
> Priority: Blocker
> Labels: pull-request-available
> Fix For: HDDS-7593
>
>
> In a scenario where I'm conducting lease recovery on multiple files during a
> rolling restart, the OM encounters abrupt failure subsequent to the restart
> of Ozone Managers (OMs).
> {code:java}
> 2024-03-31 09:47:01,866 ERROR [om72-OMStateMachineApplyTransactionThread -
> 0]-org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating
> with exit status 1: Request cmdType: RecoverLease
> traceID: ""
> clientId: "client-433C04E5C8CC"
> userInfo {
> userName: "hdfs@XYZ"
> remoteAddress: "xx.yy.ww.zz"
> hostName: "vb1307.xyz.com"
> }
> version: 3
> layoutVersion {
> version: 6
> }
> RecoverLeaseRequest {
> volumeName: "hsyncvol"
> bucketName: "hsyncbuck"
> keyName: "hsync/File_24.txt"
> force: false
> }
> failed with exception
> java.lang.NullPointerException: SecretKey client must have been initialized
> already.
> at java.util.Objects.requireNonNull(Objects.java:228)
> at
> org.apache.hadoop.hdds.security.symmetric.DefaultSecretKeySignerClient.getCurrentSecretKey(DefaultSecretKeySignerClient.java:70)
> at
> org.apache.hadoop.hdds.security.token.ShortLivedTokenSecretManager.createPassword(ShortLivedTokenSecretManager.java:47)
> at
> org.apache.hadoop.hdds.security.token.OzoneBlockTokenSecretManager.generateToken(OzoneBlockTokenSecretManager.java:70)
> at
> org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.updateBlockInfo(OMRecoverLeaseRequest.java:281)
> at
> org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.doWork(OMRecoverLeaseRequest.java:264)
> at
> org.apache.hadoop.ozone.om.request.file.OMRecoverLeaseRequest.validateAndUpdateCache(OMRecoverLeaseRequest.java:156)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
> at
> org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
> at
> org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> Have seen this 2-3 times, and this time I was able to repro it when Lease
> recovery is happening during RR phase.
> cc: [~ashishk] [~weichiu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]