[
https://issues.apache.org/jira/browse/RATIS-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Swaminathan Balachandran updated RATIS-2245:
--------------------------------------------
Description: On Ratis Snapshot and group removal the statemachine just
waits for apply transactions that have been applied on a single iteration. If
there are no more transactions added onto the state machine and all of the
apply transaction future are still in progress. The state machine ends up not
waiting for the updater thread and ends up calling the notifyGroupRemove
function and deletes the raft group directory. So this could lead to some node
not being able to apply some of the transactions still in flight in case of a
restart. (was: On group removal the statemachine just waits for apply
transactions that have been applied on a single iteration. If there are no more
transactions added onto the state machine and all of the apply transaction
future are still in progress. The state machine ends up not waiting for the
updater thread and ends up calling the notifyGroupRemove function and deletes
the raft group directory. So this could lead to some node not being able to
apply some of the transactions still in flight in case of a restart.)
> Ratis should wait for all apply transaction futures before taking snapshot
> and group remove
> -------------------------------------------------------------------------------------------
>
> Key: RATIS-2245
> URL: https://issues.apache.org/jira/browse/RATIS-2245
> Project: Ratis
> Issue Type: Bug
> Reporter: Swaminathan Balachandran
> Assignee: Swaminathan Balachandran
> Priority: Critical
>
> On Ratis Snapshot and group removal the statemachine just waits for apply
> transactions that have been applied on a single iteration. If there are no
> more transactions added onto the state machine and all of the apply
> transaction future are still in progress. The state machine ends up not
> waiting for the updater thread and ends up calling the notifyGroupRemove
> function and deletes the raft group directory. So this could lead to some
> node not being able to apply some of the transactions still in flight in case
> of a restart.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)