[ https://issues.apache.org/jira/browse/HDDS-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aryan Gupta reassigned HDDS-9608: --------------------------------- Assignee: Nandakumar (was: Aryan Gupta) > [MasterNode decommissioning] InvalidStateTransitionException after > recommissioning SCM > -------------------------------------------------------------------------------------- > > Key: HDDS-9608 > URL: https://issues.apache.org/jira/browse/HDDS-9608 > Project: Apache Ozone > Issue Type: Bug > Components: SCM > Reporter: Pratyush Bhatt > Assignee: Nandakumar > Priority: Major > > *Scenario:* Decommission and Recommission the same SCM node. > *Observation:* > {code:java} > ozone admin scm roles > 2023-11-01 04:05:18,948|INFO|MainThread|machine.py:205 - > run()||GUID=0825cc57-3a75-4632-b9e4-0ede9c2a30a6|ozn-decom202-2.ozn-decom202.xyz:1111:LEADER:aadb0a54-a86b-4be2-8fe1-9c61c4b8de3b:172.27.88.4 > 2023-11-01 04:05:18,949|INFO|MainThread|machine.py:205 - > run()||GUID=0825cc57-3a75-4632-b9e4-0ede9c2a30a6|ozn-decom202-6.ozn-decom202.xyz:1111:FOLLOWER:93bcd687-ddff-448f-b778-636c2f8652a2:172.27.17.130 > 2023-11-01 04:05:18,949|INFO|MainThread|machine.py:205 - > run()||GUID=0825cc57-3a75-4632-b9e4-0ede9c2a30a6|ozn-decom202-5.ozn-decom202.xyz:1111:FOLLOWER:a1bfdda0-c1b6-453d-91d0-9fdd3eee8041:172.27.204.67 > {code} > Node to decommission was: > {code:java} > ozn-decom202-6.ozn-decom202.xyz (A primordial Node) {code} > ozn-decom202-5.ozn-decom202.xyz was made the new primordial node > {code:java} > 'ozone.scm.primordial.node.id': 'ozn-decom202-5.ozn-decom202.xyz'{code} > All metadirs were deleted: > {code:java} > 2023-11-01 04:15:03,829|INFO|MainThread|sudo -u root rm -rf > /var/lib/hadoop-ozone/scm/data > 2023-11-01 04:15:04,072|INFO|MainThread|sudo -u root rm -rf > /var/lib/hadoop-ozone/scm/ratis > 2023-11-01 04:15:04,285|INFO|MainThread|sudo -u root rm -rf > /var/lib/hadoop-ozone/scm/ozone-metadata{code} > Node was removed: > {code:java} > 2023-11-01 04:15:04,835|Successfully deleted role > OZON1542132b-STORAGE_CONTAINER_MANAGER-68fe6978b07cabd016a5aeed2 from service > OZONE-1 {code} > Same node was added back and was recommissioned: > {code:java} > 2023-11-01 04:16:43,229|Created role_name = > OZON1542132b-STORAGE_CONTAINER_MANAGER-68fe6978b07cabd016a5aeed2 for service > = OZONE-1 on host = ozn-decom202-6.ozn-decom202.xyz {code} > SCM Bootstrap was successful as per SCM logs: > {code:java} > 2023-11-01 04:18:52,598 INFO > [main]-org.apache.hadoop.hdds.scm.ha.HASecurityUtils: Successfully stored SCM > signed certificate. > 2023-11-01 04:18:52,606 INFO > [main]-org.apache.hadoop.hdds.scm.server.StorageContainerManager: SCM > BootStrap is successful for ClusterID > CID-cb40013e-871a-4db6-85d6-d8a88831e5c9, SCMID > fec84ffb-12fe-4339-8707-aebb6641cd1c > 2023-11-01 04:18:52,606 INFO > [main]-org.apache.hadoop.hdds.scm.server.StorageContainerManager: Primary SCM > Node ID aadb0a54-a86b-4be2-8fe1-9c61c4b8de3b {code} > But soon after, SCM shuts down with InvalidStateTransitionException: Invalid > event: CLOSE at OPEN state. (Thanks [~sumitagrawal] for debugging help) > {code:java} > 2023-11-01 04:18:59,966 WARN > [fec84ffb-12fe-4339-8707-aebb6641cd1c@group-D8A88831E5C9-StateMachineUpdater]-org.apache.hadoop.hdds.scm.ha.SequenceIdGenerator: > Failed to allocate a batch for containerId, expected lastId is 0, actual > lastId is 25000. > 2023-11-01 04:18:59,971 ERROR > [fec84ffb-12fe-4339-8707-aebb6641cd1c@group-D8A88831E5C9-StateMachineUpdater]-org.apache.ratis.statemachine.StateMachine: > Terminating with exit status 1: Invalid event: CLOSE at OPEN state. > org.apache.hadoop.ozone.common.statemachine.InvalidStateTransitionException: > Invalid event: CLOSE at OPEN state. > at > org.apache.hadoop.ozone.common.statemachine.StateMachine.getNextState(StateMachine.java:60) > at > org.apache.hadoop.hdds.scm.container.ContainerStateManagerImpl.updateContainerState(ContainerStateManagerImpl.java:356) > at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hdds.scm.ha.SCMStateMachine.process(SCMStateMachine.java:188) > at > org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:148) > at > org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1777) > at > org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184) > at java.lang.Thread.run(Thread.java:748) > 2023-11-01 04:18:59,975 INFO > [shutdown-hook-0]-org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: > SHUTDOWN_MSG: {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org