[ https://issues.apache.org/jira/browse/HDDS-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758213#comment-17758213 ]
Aryan Gupta commented on HDDS-5547: ----------------------------------- Considering your example above, Step-1: "ozone1" [om1,om2,om3]: this will create group1 directory on 3 hosts Step-2: Rename "ozone1" [om1,om2,om3]to "ozone2" [om1,om2,om3], we'll still use the same group directory Step-3: Add new service "ozone1" [om4,om5,om6], this will create group1 directory on the other 3 hosts. We'll still be able to read/write from both these services as client configs will have proper mappings from service to nodes, so raft group id would be the same but they would be present on different hosts until more than one OM is running on a single host. Please correct me if I'm wrong [~szetszwo] > Generation of raftgroupId should not depend on OM service id > ------------------------------------------------------------ > > Key: HDDS-5547 > URL: https://issues.apache.org/jira/browse/HDDS-5547 > Project: Apache Ozone > Issue Type: Improvement > Reporter: Bharat Viswanadham > Assignee: Aryan Gupta > Priority: Major > > In OM HA, raftGroupID is generated from service ID. > So, if there is a change in OM Service ID OM startup fails with below error > {code:java} > 2021-08-05 12:20:03,043 ERROR org.apache.hadoop.ozone.om.OzoneManagerStarter: > OM start failed with exception > java.io.IOException: java.lang.IllegalStateException: ILLEGAL TRANSITION: In > OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING > at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) > at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:71) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:354) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:371) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.start(OzoneManagerRatisServer.java:390) > at > org.apache.hadoop.ozone.om.OzoneManager.start(OzoneManager.java:1109) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:126) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:79) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:67) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:38) > at picocli.CommandLine.executeUserObject(CommandLine.java:1933) > at picocli.CommandLine.access$1100(CommandLine.java:145) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2326) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2291) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530) > at picocli.CommandLine.parseWithHandler(CommandLine.java:2465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:51) > Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In > OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING > at > org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60) > at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121) > at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164) > at > org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:127) > at > org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:120) > at > org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > One possible solution is > If a ratis group dir already exists, use that as it is an existing cluster we > cannot change. For new clusters might be we can use clusterID which does not > change for a ozone cluster, in this way we shall be tolerant to service id > config change. > This is just one idea, we can discuss any other approaches to solve this > issue and fix this. > As right now, in OM we don't allow change of om service id -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org