[ 
https://issues.apache.org/jira/browse/RATIS-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze reassigned RATIS-2162:
---------------------------------

    Component/s: Leader
       Assignee: yuuka

[~tohsakarin__] , That is great, thanks a lot!

> When closing leaderState, if the logAppender thread sends a snapshot, a 
> deadlock may occur
> ------------------------------------------------------------------------------------------
>
>                 Key: RATIS-2162
>                 URL: https://issues.apache.org/jira/browse/RATIS-2162
>             Project: Ratis
>          Issue Type: Wish
>          Components: Leader
>    Affects Versions: 3.1.0
>            Reporter: yuuka
>            Assignee: yuuka
>            Priority: Major
>         Attachments: image-2024-09-24-10-41-20-140.png, 
> image-2024-09-24-10-43-34-812.png
>
>
> This is the reason for the jira 2161 problem.
> RATIS-2161 Grpc may spawn many threads - ASF JIRA (apache.org)
> 1. Old Leader S receives larger term number and convert to follower.  
> 2. LogAppender thread L did not receive the shutdown signal in time due to 
> abnormal triggering of restart
> 3. S will hold the ‘server’ lock and wait for L to shut down
> 4. L triggers snapshot sending, calls newSnapshotRequests5. In 
> newSnapshotRequests, L will acquire the ‘server’ lock
>   !image-2024-09-24-10-43-34-812.png!
> !image-2024-09-24-10-41-20-140.png!
> This eventually leads to a deadlock, grpc cannot reclaim the thread in time, 
> and then the problem of jira 2161 occurs
>                                                                      stop 
> LogAppender L
> close LeaderState                                                |
> timeline.  --------------------------------------
>                  |                            -----------------------       
> logAppender L TimeLine
>            shutdown                    |                                  |
>        LeaderState                restart                 
> newInstallSnapshotRequests
>                                       logAppender         
>  
>  
> I think it is possible to check the status of raft every time LogAppender is 
> awakened, and close it if it is not currently the leader
>  
>  
> In addition, in LeaderStateImpl, there is another concurrency safety issue 
> regarding senderList.
> removeSenders, addSenders, stopAll may be accessed by multiple threads.
> For example, thread t1 creates a futures array with a size of 3 in stopAll, 
> and then thread t2 calls removeSenders, which may cause out-of-bounds access 
> because future.length is 3, but senders .size () < 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to