[ 
https://issues.apache.org/jira/browse/HDDS-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-15578:
-------------------------------
    Description: 
There are methods annotated with @Replicate that can throw 
InvalidStateTransitionException like 
ContainerStateManager#updateContainerStateWithSequenceId.

When the method is applied by SCM Ratis, an exception from the 
StateMachineUpdater path can terminate SCM although it is not really a critical 
error (e.g. if there are duplicate events, we can simply ignore one). The 
interface comment even says replicated methods should be idempotent, but this 
implementation is not fully idempotent for stale/duplicate events.

Example risk:
- Leader submits FINALIZE for OPEN.
- Before/apply ordering or duplicate report causes the current state to already 
be CLOSING.
- Applying FINALIZE at CLOSING is invalid.
- Exception escapes from replicated apply path.

The chance is very low since most of there is a check of the container state 
before in updateContainerStateWithSequenceId, but it's there.

We can try to fix it by 
- Inside the replicated implementation, catch InvalidStateTransitionException.
- Log and return without mutation.
- Treat it as a stale/duplicate lifecycle event, not a fatal 
replicated-state-machine error.

  was:
There are methods annotated with @Replicate that can throw 
InvalidStateTransitionException like ContainerStateManager#updateContainerState 
or ContainerStateManager#updateContainerStateWithSequenceId.

When the method is applied by SCM Ratis, an exception from the 
StateMachineUpdater path can terminate SCM although it is not really a critical 
error (e.g. if there are duplicate events, we can simply ignore one). The 
interface comment even says replicated methods should be idempotent, but this 
implementation is not fully idempotent for stale/duplicate events.


Example risk:
- Leader submits FINALIZE for OPEN.
- Before/apply ordering or duplicate report causes the current state to already 
be CLOSING.
- Applying FINALIZE at CLOSING is invalid.
- Exception escapes from replicated apply path.

We can try to fix it by 
- Inside the replicated implementation, catch InvalidStateTransitionException.
- Log and return without mutation.
- Treat it as a stale/duplicate lifecycle event, not a fatal 
replicated-state-machine error.


> InvalidStateTransitionException can crash SCM
> ---------------------------------------------
>
>                 Key: HDDS-15578
>                 URL: https://issues.apache.org/jira/browse/HDDS-15578
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> There are methods annotated with @Replicate that can throw 
> InvalidStateTransitionException like 
> ContainerStateManager#updateContainerStateWithSequenceId.
> When the method is applied by SCM Ratis, an exception from the 
> StateMachineUpdater path can terminate SCM although it is not really a 
> critical error (e.g. if there are duplicate events, we can simply ignore 
> one). The interface comment even says replicated methods should be 
> idempotent, but this implementation is not fully idempotent for 
> stale/duplicate events.
> Example risk:
> - Leader submits FINALIZE for OPEN.
> - Before/apply ordering or duplicate report causes the current state to 
> already be CLOSING.
> - Applying FINALIZE at CLOSING is invalid.
> - Exception escapes from replicated apply path.
> The chance is very low since most of there is a check of the container state 
> before in updateContainerStateWithSequenceId, but it's there.
> We can try to fix it by 
> - Inside the replicated implementation, catch InvalidStateTransitionException.
> - Log and return without mutation.
> - Treat it as a stale/duplicate lifecycle event, not a fatal 
> replicated-state-machine error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to