[
https://issues.apache.org/jira/browse/HDDS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742421#comment-17742421
]
Sammi Chen edited comment on HDDS-8983 at 7/12/23 1:51 PM:
-----------------------------------------------------------
[~adoroszlai] Thanks for reporting this. It's caused by one timing issue.
scm4.org received the rotationCommit request before it saved the new sub CA
serial id by calling handler.setSubCACertId(newSubCACertId). The ratis request
executed in the statemachine is concurrently running with the executor in
RootCARotationManager, the executor is running the SubCARotationPrepareTask at
this moment.
{code:java}
scm4.org_1 | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] -
rotation prepare ack sent out, new scm certificate 765013863887
scm4.org_1 | 2023-07-06 12:07:18,843
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater]
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1 | java.lang.NullPointerException
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1 | at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1 | at
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1 | at
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1 | at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1 | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1 | at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1 | at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1 | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887
{code}
was (Author: sammi):
[~adoroszlai] Thanks for reporting the issue. It's caused by one timing issue.
scm4.org received the rotationCommit request before it saved the new sub CA
serial id by calling handler.setSubCACertId(newSubCACertId). The ratis request
executed in the statemachine is concurrently running with the executor in
RootCARotationManager, the executor is running the SubCARotationPrepareTask at
this moment.
{code:java}
scm4.org_1 | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] -
rotation prepare ack sent out, new scm certificate 765013863887
scm4.org_1 | 2023-07-06 12:07:18,843
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater]
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1 | java.lang.NullPointerException
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1 | at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1 | at
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1 | at
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1 | at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1 | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1 | at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1 | at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1 | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887
{code}
> Intermittent failure in test-root-ca-rotation.sh due to null certId
> -------------------------------------------------------------------
>
> Key: HDDS-8983
> URL: https://issues.apache.org/jira/browse/HDDS-8983
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: Security
> Affects Versions: 1.4.0
> Reporter: Attila Doroszlai
> Assignee: Sammi Chen
> Priority: Critical
> Labels: pull-request-available
> Attachments: acceptance-HA-secure.zip
>
>
> {code:title=https://github.com/apache/ozone/actions/runs/5474960903/jobs/9970829053}
> ozone admin cert info 4 succeed
> ==============================================================================
> Scm-Leader-Transfer :: Smoketest ozone cluster startup
> ==============================================================================
> Transfer Leadership jstack
> 7 >
> /home/runner/work/ozone/ozone/hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozonesecure-ha/result/ozonesecure-ha_datanode1_1_HddsDatanodeService.stack
> ...
> ERROR: Test execution of ozonesecure-ha/test-root-ca-rotation.sh is FAILED!!!!
> {code}
> *
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/06/24060/acceptance-HA-secure
> *
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/07/24068/acceptance-HA-secure
> CC [~pifta], [~sgal]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]