[
https://issues.apache.org/jira/browse/HDDS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742421#comment-17742421
]
Sammi Chen edited comment on HDDS-8983 at 7/12/23 1:12 PM:
-----------------------------------------------------------
[~adoroszlai] Thanks for reporting the issue. It's caused by one race
condition. scm4.org received the rotationCommit request before it saved the
new sub CA serial id by calling handler.setSubCACertId(newSubCACertId). The
ratis request executed in the statemachine is concurrently running with the
executor in RootCARotationManager, the executor is running the
SubCARotationPrepareTask at this moment.
{code:java}
scm4.org_1 | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] -
rotation prepare ack sent out, new scm certificate 765013863887
scm4.org_1 | 2023-07-06 12:07:18,843
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater]
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1 | java.lang.NullPointerException
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1 | at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1 | at
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1 | at
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1 | at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1 | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1 | at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1 | at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1 | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887
{code}
was (Author: sammi):
Thanks for reporting the issue. It's caused by one race condition. scm4.org
received the rotationCommit request before it saved the new sub CA serial id by
calling handler.setSubCACertId(newSubCACertId). The ratis request executed in
the statemachine is concurrently running with the executor in
RootCARotationManager, the executor is running the SubCARotationPrepareTask at
this moment.
{code:java}
scm4.org_1 | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] -
rotation prepare ack sent out, new scm certificate 765013863887
scm4.org_1 | 2023-07-06 12:07:18,843
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater]
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1 | java.lang.NullPointerException
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1 | at
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1 | at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1 | at
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1 | at
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1 | at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1 | at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1 | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1 | at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1 | at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1 | at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1 | at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1 | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887
{code}
> Intermittent failure in test-root-ca-rotation.sh due to null certId
> -------------------------------------------------------------------
>
> Key: HDDS-8983
> URL: https://issues.apache.org/jira/browse/HDDS-8983
> Project: Apache Ozone
> Issue Type: Sub-task
> Components: Security
> Affects Versions: 1.4.0
> Reporter: Attila Doroszlai
> Assignee: Sammi Chen
> Priority: Critical
> Labels: pull-request-available
> Attachments: acceptance-HA-secure.zip
>
>
> {code:title=https://github.com/apache/ozone/actions/runs/5474960903/jobs/9970829053}
> ozone admin cert info 4 succeed
> ==============================================================================
> Scm-Leader-Transfer :: Smoketest ozone cluster startup
> ==============================================================================
> Transfer Leadership jstack
> 7 >
> /home/runner/work/ozone/ozone/hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozonesecure-ha/result/ozonesecure-ha_datanode1_1_HddsDatanodeService.stack
> ...
> ERROR: Test execution of ozonesecure-ha/test-root-ca-rotation.sh is FAILED!!!!
> {code}
> *
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/06/24060/acceptance-HA-secure
> *
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/07/24068/acceptance-HA-secure
> CC [~pifta], [~sgal]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]