Ivan Andika created HDDS-15578:
----------------------------------
Summary: InvalidStateTransitionException can crash SCM
Key: HDDS-15578
URL: https://issues.apache.org/jira/browse/HDDS-15578
Project: Apache Ozone
Issue Type: Improvement
Reporter: Ivan Andika
There are methods annotated with @Replicate that can throw
InvalidStateTransitionException like ContainerStateManager#updateContainerState
or ContainerStateManager#updateContainerStateWithSequenceId.
When the method is applied by SCM Ratis, an exception from the
StateMachineUpdater path can terminate SCM. The interface comment even says
replicated methods should be idempotent, but this implementation is not fully
idempotent for stale/duplicate events.
Example risk:
- Leader submits FINALIZE for OPEN.
- Before/apply ordering or duplicate report causes the current state to already
be CLOSING.
- Applying FINALIZE at CLOSING is invalid.
- Exception escapes from replicated apply path.
We can try to fix it by
- Inside the replicated implementation, catch InvalidStateTransitionException.
- Log and return without mutation.
- Treat it as a stale/duplicate lifecycle event, not a fatal
replicated-state-machine error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]