OneSizeFitsQuorum commented on code in PR #1140:
URL: https://github.com/apache/ratis/pull/1140#discussion_r1738175510
##########
ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java:
##########
@@ -463,6 +465,13 @@ void groupRemove(boolean deleteDirectory, boolean
renameDirectory) {
/* Shutdown is triggered here inorder to avoid any locked files. */
state.getStateMachineUpdater().setRemoving();
close();
+ try {
+ closeFinishedLatch.await();
Review Comment:
<img width="1065" alt="image"
src="https://github.com/user-attachments/assets/1749e063-3155-4635-99c6-687d702d921d">
In fact, there is a difference because lifeCycle.checkStateAndClose is a CAS
(Compare-And-Swap) operation, meaning only the thread that successfully
performs the CAS can execute this function.
In our reproduced case, the election thread successfully performed the CAS
to set the state to closing and was executing the shutdown code inside. Then,
when the thread responsible for deleting the raftgroup called this function and
found that the state was closing, it essentially did nothing and returned to
delete the file directory, which subsequently caused an error in
StateMachineUpdater.
I checked the contents of this lambda expression, and except for
FollowerState and LeaderElection which are asynchronous, the rest should be
synchronous, including the StateMachineUpdater that we discovered this time.
Therefore, after adding this countDownLatch, at least the error we found will
not occur again.
As for whether it's necessary to also synchronize the waiting for
FollowerState and LeaderElection, I think it could be either way. Even if they
clean up asynchronously, they will ultimately just transition to Leader, and
the underlying shutdownLeaderState and state.close can prevent any impact from
occurring.
Additionally, I found that we can't directly add a synchronized signature to
the close function, because this lambda function will still only be executed by
the thread that successfully performs the CAS.
Overall, I feel that the current changes should not introduce any side
effects. What do you think? @szetszwo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]