[ https://issues.apache.org/jira/browse/HDDS-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758724#comment-17758724 ]
Tsz-wo Sze commented on HDDS-5547: ---------------------------------- Sure, it is good to just throw an exception. > Generation of raftgroupId should not depend on OM service id > ------------------------------------------------------------ > > Key: HDDS-5547 > URL: https://issues.apache.org/jira/browse/HDDS-5547 > Project: Apache Ozone > Issue Type: Improvement > Reporter: Bharat Viswanadham > Assignee: Aryan Gupta > Priority: Major > > In OM HA, raftGroupID is generated from service ID. > So, if there is a change in OM Service ID OM startup fails with below error > {code:java} > 2021-08-05 12:20:03,043 ERROR org.apache.hadoop.ozone.om.OzoneManagerStarter: > OM start failed with exception > java.io.IOException: java.lang.IllegalStateException: ILLEGAL TRANSITION: In > OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING > at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) > at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:71) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:354) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:371) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.start(OzoneManagerRatisServer.java:390) > at > org.apache.hadoop.ozone.om.OzoneManager.start(OzoneManager.java:1109) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:126) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:79) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:67) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:38) > at picocli.CommandLine.executeUserObject(CommandLine.java:1933) > at picocli.CommandLine.access$1100(CommandLine.java:145) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2326) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2291) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530) > at picocli.CommandLine.parseWithHandler(CommandLine.java:2465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:51) > Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In > OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING > at > org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60) > at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121) > at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164) > at > org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:127) > at > org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:120) > at > org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > One possible solution is > If a ratis group dir already exists, use that as it is an existing cluster we > cannot change. For new clusters might be we can use clusterID which does not > change for a ozone cluster, in this way we shall be tolerant to service id > config change. > This is just one idea, we can discuss any other approaches to solve this > issue and fix this. > As right now, in OM we don't allow change of om service id -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org