[
https://issues.apache.org/jira/browse/RATIS-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze updated RATIS-2146:
------------------------------
Fix Version/s: (was: 3.2.0)
> Fixed possible issues caused by concurrent deletion and election when member
> changes
> ------------------------------------------------------------------------------------
>
> Key: RATIS-2146
> URL: https://issues.apache.org/jira/browse/RATIS-2146
> Project: Ratis
> Issue Type: Improvement
> Components: server
> Reporter: Xinyu Tan
> Assignee: Xinyu Tan
> Priority: Major
> Fix For: 3.1.1
>
> Attachments: image-2024-08-28-14-53-23-259.png,
> image-2024-08-28-14-53-27-637.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> During this process, we encountered some concurrency issues:
> * After the member change is complete, node D will no longer be a member of
> this consensus group. It will attempt to initiate an election but receive a
> NOT_IN_CONF response, after which it will close itself.
> * During the removal of member D, it will also close itself first, and then
> proceed to delete the file directory.
> These two CLOSE operations may occur concurrently, which could result in the
> directory being deleted while the StateMachineUpdater thread has not yet
> closed, ultimately leading to unexpected errors.
> !image-2024-08-28-14-53-23-259.png!
> !image-2024-08-28-14-53-27-637.png!
> I believe there are two possible solutions for this issue:
> * Add concurrency control to the close function, such as adding the
> synchronized keyword to the function.
> * Add some checks before deleting the directory to ensure that the callback
> functions in the close process have already been executed before the
> directory is deleted.
> What's your opinion? [~szetszwo]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)