[ 
https://issues.apache.org/jira/browse/HDDS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aryan Gupta reassigned HDDS-9608:
---------------------------------

    Assignee: Nandakumar  (was: Aryan Gupta)

> [MasterNode decommissioning] InvalidStateTransitionException after 
> recommissioning SCM
> --------------------------------------------------------------------------------------
>
>                 Key: HDDS-9608
>                 URL: https://issues.apache.org/jira/browse/HDDS-9608
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Pratyush Bhatt
>            Assignee: Nandakumar
>            Priority: Major
>
> *Scenario:* Decommission and Recommission the same SCM node.
> *Observation:*
> {code:java}
> ozone admin scm roles
> 2023-11-01 04:05:18,948|INFO|MainThread|machine.py:205 - 
> run()||GUID=0825cc57-3a75-4632-b9e4-0ede9c2a30a6|ozn-decom202-2.ozn-decom202.xyz:1111:LEADER:aadb0a54-a86b-4be2-8fe1-9c61c4b8de3b:172.27.88.4
> 2023-11-01 04:05:18,949|INFO|MainThread|machine.py:205 - 
> run()||GUID=0825cc57-3a75-4632-b9e4-0ede9c2a30a6|ozn-decom202-6.ozn-decom202.xyz:1111:FOLLOWER:93bcd687-ddff-448f-b778-636c2f8652a2:172.27.17.130
> 2023-11-01 04:05:18,949|INFO|MainThread|machine.py:205 - 
> run()||GUID=0825cc57-3a75-4632-b9e4-0ede9c2a30a6|ozn-decom202-5.ozn-decom202.xyz:1111:FOLLOWER:a1bfdda0-c1b6-453d-91d0-9fdd3eee8041:172.27.204.67
>  {code}
> Node to decommission was: 
> {code:java}
> ozn-decom202-6.ozn-decom202.xyz (A primordial Node) {code}
> ozn-decom202-5.ozn-decom202.xyz was made the new primordial node
> {code:java}
> 'ozone.scm.primordial.node.id': 'ozn-decom202-5.ozn-decom202.xyz'{code}
> All metadirs were deleted:
> {code:java}
> 2023-11-01 04:15:03,829|INFO|MainThread|sudo -u root rm -rf 
> /var/lib/hadoop-ozone/scm/data
> 2023-11-01 04:15:04,072|INFO|MainThread|sudo -u root rm -rf 
> /var/lib/hadoop-ozone/scm/ratis
> 2023-11-01 04:15:04,285|INFO|MainThread|sudo -u root rm -rf 
> /var/lib/hadoop-ozone/scm/ozone-metadata{code}
> Node was removed:
> {code:java}
> 2023-11-01 04:15:04,835|Successfully deleted role 
> OZON1542132b-STORAGE_CONTAINER_MANAGER-68fe6978b07cabd016a5aeed2 from service 
> OZONE-1 {code}
> Same node was added back and was recommissioned:
> {code:java}
> 2023-11-01 04:16:43,229|Created role_name = 
> OZON1542132b-STORAGE_CONTAINER_MANAGER-68fe6978b07cabd016a5aeed2 for service 
> = OZONE-1 on host = ozn-decom202-6.ozn-decom202.xyz {code}
> SCM Bootstrap was successful as per SCM logs:
> {code:java}
> 2023-11-01 04:18:52,598 INFO 
> [main]-org.apache.hadoop.hdds.scm.ha.HASecurityUtils: Successfully stored SCM 
> signed certificate.
> 2023-11-01 04:18:52,606 INFO 
> [main]-org.apache.hadoop.hdds.scm.server.StorageContainerManager: SCM 
> BootStrap  is successful for ClusterID 
> CID-cb40013e-871a-4db6-85d6-d8a88831e5c9, SCMID 
> fec84ffb-12fe-4339-8707-aebb6641cd1c
> 2023-11-01 04:18:52,606 INFO 
> [main]-org.apache.hadoop.hdds.scm.server.StorageContainerManager: Primary SCM 
> Node ID aadb0a54-a86b-4be2-8fe1-9c61c4b8de3b {code}
> But soon after, SCM shuts down with InvalidStateTransitionException: Invalid 
> event: CLOSE at OPEN state. (Thanks [~sumitagrawal] for debugging help)
> {code:java}
> 2023-11-01 04:18:59,966 WARN 
> [fec84ffb-12fe-4339-8707-aebb6641cd1c@group-D8A88831E5C9-StateMachineUpdater]-org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator:
>  Failed to allocate a batch for containerId, expected lastId is 0, actual 
> lastId is 25000.
> 2023-11-01 04:18:59,971 ERROR 
> [fec84ffb-12fe-4339-8707-aebb6641cd1c@group-D8A88831E5C9-StateMachineUpdater]-org.apache.ratis.statemachine.StateMachine:
>  Terminating with exit status 1: Invalid event: CLOSE at OPEN state.
> org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException: 
> Invalid event: CLOSE at OPEN state.
>         at 
> org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:60)
>         at 
> org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:356)
>         at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188)
>         at 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148)
>         at 
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
>         at java.lang.Thread.run(Thread.java:748)
> 2023-11-01 04:18:59,975 INFO 
> [shutdown-hook-0]-org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter:
>  SHUTDOWN_MSG: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to