[ 
https://issues.apache.org/jira/browse/HDDS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742421#comment-17742421
 ] 

Sammi Chen edited comment on HDDS-8983 at 7/12/23 1:12 PM:
-----------------------------------------------------------

[~adoroszlai] Thanks for reporting the issue.  It's caused by one race 
condition.  scm4.org received the rotationCommit request before it saved the 
new sub CA serial id by calling handler.setSubCACertId(newSubCACertId).  The 
ratis request executed in the statemachine is concurrently running with the 
executor in RootCARotationManager, the executor is running the 
SubCARotationPrepareTask at this moment. 


{code:java}
scm4.org_1   | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO 
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] - 
rotation prepare ack sent out, new scm certificate 765013863887

scm4.org_1   | 2023-07-06 12:07:18,843 
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater] 
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1   | java.lang.NullPointerException
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1   |  at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1   |  at 
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1   |  at 
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1   |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1   |  at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1   |  at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1   | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO 
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887

{code}



was (Author: sammi):
Thanks for reporting the issue.  It's caused by one race condition.  scm4.org 
received the rotationCommit request before it saved the new sub CA serial id by 
calling handler.setSubCACertId(newSubCACertId).  The ratis request executed in 
the statemachine is concurrently running with the executor in 
RootCARotationManager, the executor is running the SubCARotationPrepareTask at 
this moment. 


{code:java}
scm4.org_1   | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO 
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] - 
rotation prepare ack sent out, new scm certificate 765013863887

scm4.org_1   | 2023-07-06 12:07:18,843 
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater] 
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1   | java.lang.NullPointerException
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1   |  at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1   |  at 
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1   |  at 
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1   |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1   |  at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1   |  at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1   | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO 
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887

{code}


> Intermittent failure in test-root-ca-rotation.sh due to null certId
> -------------------------------------------------------------------
>
>                 Key: HDDS-8983
>                 URL: https://issues.apache.org/jira/browse/HDDS-8983
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Security
>    Affects Versions: 1.4.0
>            Reporter: Attila Doroszlai
>            Assignee: Sammi Chen
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: acceptance-HA-secure.zip
>
>
> {code:title=https://github.com/apache/ozone/actions/runs/5474960903/jobs/9970829053}
> ozone admin cert info 4 succeed
> ==============================================================================
> Scm-Leader-Transfer :: Smoketest ozone cluster startup                        
> ==============================================================================
> Transfer Leadership                                                   jstack 
> 7 > 
> /home/runner/work/ozone/ozone/hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozonesecure-ha/result/ozonesecure-ha_datanode1_1_HddsDatanodeService.stack
> ...
> ERROR: Test execution of ozonesecure-ha/test-root-ca-rotation.sh is FAILED!!!!
> {code}
> * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/06/24060/acceptance-HA-secure
> * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/07/24068/acceptance-HA-secure
> CC [~pifta], [~sgal]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to