[
https://issues.apache.org/jira/browse/HDDS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze updated HDDS-9887:
-----------------------------
Description:
When the OMServiceId is changed, OM should detect it and fail early.
----
(original description)
Seen this error on a new Ozone cluster and OM crashed, unable to restart.
(Version Cloudera CDP 7.1.9). Notably, I've seen this error twice in a week on
separate clusters.
{code:java}
2023-12-08 14:27:15,265 ERROR
[main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
exception
java.util.concurrent.CompletionException: java.lang.IllegalStateException:
ILLEGAL TRANSITION: In OzoneManagerStateMachine:om27:group-9F198C4C3682,
RUNNING -> STARTING
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1298)
at
java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
at
java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1034)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:189)
at
org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:180)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In
OzoneManagerStateMachine:om27:group-9F198C4C3682, RUNNING -> STARTING
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
at
org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:140)
at
org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:173)
at
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:338)
at
org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:188)
... 4 more {code}
was:
Seen this error on a new Ozone cluster and OM crashed, unable to restart.
(Version Cloudera CDP 7.1.9). Notably, I've seen this error twice in a week on
separate clusters.
{code:java}
2023-12-08 14:27:15,265 ERROR
[main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
exception
java.util.concurrent.CompletionException: java.lang.IllegalStateException:
ILLEGAL TRANSITION: In OzoneManagerStateMachine:om27:group-9F198C4C3682,
RUNNING -> STARTING
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1298)
at
java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
at
java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1034)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at
org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:189)
at
org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:180)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In
OzoneManagerStateMachine:om27:group-9F198C4C3682, RUNNING -> STARTING
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
at
org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:140)
at
org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:173)
at
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:338)
at
org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:188)
... 4 more {code}
Summary: Detect OMServiceId change and fail early (was: Detect
OMServiceId change)
> Detect OMServiceId change and fail early
> ----------------------------------------
>
> Key: HDDS-9887
> URL: https://issues.apache.org/jira/browse/HDDS-9887
> Project: Apache Ozone
> Issue Type: Improvement
> Components: OM
> Reporter: Wei-Chiu Chuang
> Priority: Major
> Attachments: om_illegal_transition.tgz, ozone-om.log
>
>
> When the OMServiceId is changed, OM should detect it and fail early.
> ----
> (original description)
> Seen this error on a new Ozone cluster and OM crashed, unable to restart.
> (Version Cloudera CDP 7.1.9). Notably, I've seen this error twice in a week
> on separate clusters.
>
> {code:java}
> 2023-12-08 14:27:15,265 ERROR
> [main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
> exception
> java.util.concurrent.CompletionException: java.lang.IllegalStateException:
> ILLEGAL TRANSITION: In OzoneManagerStateMachine:om27:group-9F198C4C3682,
> RUNNING -> STARTING
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at
> java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1298)
> at
> java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
> at
> java.util.concurrent.CompletableFuture$CoCompletion.tryFire(CompletableFuture.java:1034)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:189)
> at
> org.apache.ratis.util.ConcurrentUtils.lambda$null$4(ConcurrentUtils.java:180)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In
> OzoneManagerStateMachine:om27:group-9F198C4C3682, RUNNING -> STARTING
> at
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
> at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
> at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
> at
> org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:140)
> at
> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:173)
> at
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:338)
> at
> org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:188)
> ... 4 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]