[ 
https://issues.apache.org/jira/browse/HDDS-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742421#comment-17742421
 ] 

Sammi Chen edited comment on HDDS-8983 at 7/12/23 1:51 PM:
-----------------------------------------------------------

[~adoroszlai] Thanks for reporting this.  It's caused by one timing issue.  
scm4.org received the rotationCommit request before it saved the new sub CA 
serial id by calling handler.setSubCACertId(newSubCACertId).  The ratis request 
executed in the statemachine is concurrently running with the executor in 
RootCARotationManager, the executor is running the SubCARotationPrepareTask at 
this moment. 


{code:java}
scm4.org_1   | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO 
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] - 
rotation prepare ack sent out, new scm certificate 765013863887

scm4.org_1   | 2023-07-06 12:07:18,843 
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater] 
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1   | java.lang.NullPointerException
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1   |  at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1   |  at 
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1   |  at 
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1   |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1   |  at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1   |  at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1   | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO 
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887

{code}



was (Author: sammi):
[~adoroszlai] Thanks for reporting the issue.  It's caused by one timing issue. 
 scm4.org received the rotationCommit request before it saved the new sub CA 
serial id by calling handler.setSubCACertId(newSubCACertId).  The ratis request 
executed in the statemachine is concurrently running with the executor in 
RootCARotationManager, the executor is running the SubCARotationPrepareTask at 
this moment. 


{code:java}
scm4.org_1   | 2023-07-06 12:07:18,841 [RootCARotationManager-Inactive] INFO 
security.RootCARotationManager: SubCARotationPrepareTask[rootCertId = 4] - 
rotation prepare ack sent out, new scm certificate 765013863887

scm4.org_1   | 2023-07-06 12:07:18,843 
[97b42920-03c3-4ab8-89de-cf43ff4c90ea@group-EC1C2E4E7C1F-StateMachineUpdater] 
ERROR statemachine.StateMachine: Terminating with exit status 1: null
scm4.org_1   | java.lang.NullPointerException
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
scm4.org_1   |  at 
java.base/java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
scm4.org_1   |  at java.base/java.util.Properties.put(Properties.java:1340)
scm4.org_1   |  at 
java.base/java.util.Properties.setProperty(Properties.java:228)
scm4.org_1   |  at 
org.apache.hadoop.ozone.common.StorageInfo.setProperty(StorageInfo.java:154)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.server.SCMStorageConfig.setScmCertSerialId(SCMStorageConfig.java:106)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.security.RootCARotationHandlerImpl.rotationCommit(RootCARotationHandlerImpl.java:153)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm4.org_1   |  at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm4.org_1   |  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
scm4.org_1   |  at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
scm4.org_1   |  at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
scm4.org_1   |  at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
scm4.org_1   |  at java.base/java.lang.Thread.run(Thread.java:829)
scm4.org_1   | 2023-07-06 12:07:18,846 [RootCARotationManager-Inactive] INFO 
security.RootCARotationHandlerImpl: Scm sub CA new certificate is 765013863887

{code}


> Intermittent failure in test-root-ca-rotation.sh due to null certId
> -------------------------------------------------------------------
>
>                 Key: HDDS-8983
>                 URL: https://issues.apache.org/jira/browse/HDDS-8983
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: Security
>    Affects Versions: 1.4.0
>            Reporter: Attila Doroszlai
>            Assignee: Sammi Chen
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: acceptance-HA-secure.zip
>
>
> {code:title=https://github.com/apache/ozone/actions/runs/5474960903/jobs/9970829053}
> ozone admin cert info 4 succeed
> ==============================================================================
> Scm-Leader-Transfer :: Smoketest ozone cluster startup                        
> ==============================================================================
> Transfer Leadership                                                   jstack 
> 7 > 
> /home/runner/work/ozone/ozone/hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozonesecure-ha/result/ozonesecure-ha_datanode1_1_HddsDatanodeService.stack
> ...
> ERROR: Test execution of ozonesecure-ha/test-root-ca-rotation.sh is FAILED!!!!
> {code}
> * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/06/24060/acceptance-HA-secure
> * 
> https://github.com/adoroszlai/ozone-build-results/tree/master/2023/07/07/24068/acceptance-HA-secure
> CC [~pifta], [~sgal]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to