[ 
https://issues.apache.org/jira/browse/RATIS-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swaminathan Balachandran updated RATIS-2245:
--------------------------------------------
    Description: On Ratis Snapshot and group removal the statemachine just 
waits for apply transactions that have been applied on a single iteration. If 
there are no more transactions added onto the state machine and all of the 
apply transaction future are still in progress. The state machine ends up not 
waiting for the updater thread and ends up calling the notifyGroupRemove 
function and deletes the raft group directory. So this could lead to some node 
not being able to apply some of the transactions still in flight in case of a 
restart.  (was: On group removal the statemachine just waits for apply 
transactions that have been applied on a single iteration. If there are no more 
transactions added onto the state machine and all of the apply transaction 
future are still in progress. The state machine ends up not waiting for the 
updater thread and ends up calling the notifyGroupRemove function and deletes 
the raft group directory. So this could lead to some node not being able to 
apply some of the transactions still in flight in case of a restart.)

> Ratis should wait for all apply transaction futures before taking snapshot 
> and group remove
> -------------------------------------------------------------------------------------------
>
>                 Key: RATIS-2245
>                 URL: https://issues.apache.org/jira/browse/RATIS-2245
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Swaminathan Balachandran
>            Assignee: Swaminathan Balachandran
>            Priority: Critical
>
> On Ratis Snapshot and group removal the statemachine just waits for apply 
> transactions that have been applied on a single iteration. If there are no 
> more transactions added onto the state machine and all of the apply 
> transaction future are still in progress. The state machine ends up not 
> waiting for the updater thread and ends up calling the notifyGroupRemove 
> function and deletes the raft group directory. So this could lead to some 
> node not being able to apply some of the transactions still in flight in case 
> of a restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to