Xinyu Tan created RATIS-2146:
--------------------------------
Summary: Fixed possible issues caused by concurrent deletion and
election when member changes
Key: RATIS-2146
URL: https://issues.apache.org/jira/browse/RATIS-2146
Project: Ratis
Issue Type: Improvement
Reporter: Xinyu Tan
Assignee: Xinyu Tan
Attachments: image-2024-08-28-14-53-23-259.png,
image-2024-08-28-14-53-27-637.png
During this process, we encountered some concurrency issues:
* After the member change is complete, node D will no longer be a member of
this consensus group. It will attempt to initiate an election but receive a
NOT_IN_CONF response, after which it will close itself.
* During the removal of member D, it will also close itself first, and then
proceed to delete the file directory.
These two CLOSE operations may occur concurrently, which could result in the
directory being deleted while the StateMachineUpdater thread has not yet
closed, ultimately leading to unexpected errors.
!image-2024-08-28-14-53-23-259.png!
!image-2024-08-28-14-53-27-637.png!
I believe there are two possible solutions for this issue:
* Add concurrency control to the close function, such as adding the
synchronized keyword to the function.
* Add some checks before deleting the directory to ensure that the callback
functions in the close process have already been executed before the directory
is deleted.
What's your opinion? [~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)