Xinyu Tan created RATIS-2146:
--------------------------------

             Summary: Fixed possible issues caused by concurrent deletion and 
election when member changes
                 Key: RATIS-2146
                 URL: https://issues.apache.org/jira/browse/RATIS-2146
             Project: Ratis
          Issue Type: Improvement
            Reporter: Xinyu Tan
            Assignee: Xinyu Tan
         Attachments: image-2024-08-28-14-53-23-259.png, 
image-2024-08-28-14-53-27-637.png

During this process, we encountered some concurrency issues:
* After the member change is complete, node D will no longer be a member of 
this consensus group. It will attempt to initiate an election but receive a 
NOT_IN_CONF response, after which it will close itself.
* During the removal of member D, it will also close itself first, and then 
proceed to delete the file directory.
These two CLOSE operations may occur concurrently, which could result in the 
directory being deleted while the StateMachineUpdater thread has not yet 
closed, ultimately leading to unexpected errors.

 !image-2024-08-28-14-53-23-259.png! 
 !image-2024-08-28-14-53-27-637.png! 

I believe there are two possible solutions for this issue:

* Add concurrency control to the close function, such as adding the 
synchronized keyword to the function.
* Add some checks before deleting the directory to ensure that the callback 
functions in the close process have already been executed before the directory 
is deleted.


What's your opinion? [~szetszwo]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to