[ 
https://issues.apache.org/jira/browse/RATIS-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160931#comment-17160931
 ] 

runzhiwang commented on RATIS-1004:
-----------------------------------

[~szetszwo] Thanks for review.

bq. Do you know which two locks are causing the deadlock?

grpc-default-executor-3449 thread hold the lock of 
[RaftLog|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java#L83]
 and wait the lock of 
[DataBlockingQueue|https://github.com/apache/incubator-ratis/blob/master/ratis-common/src/main/java/org/apache/ratis/util/DataBlockingQueue.java#L96]

SegmentedRaftLogWorker thread hold the lock of 
[DataBlockingQueue|https://github.com/apache/incubator-ratis/blob/master/ratis-common/src/main/java/org/apache/ratis/util/DataBlockingQueue.java#L45]
 and wait the lock of 
[RaftLog|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLog.java#L510]

bq. BTW, I have noticed that the line numbers shown in the stack traces are 
different from trunk. 

I use the code of version 0.6.0-6ab75ae-SNAPSHOT. The code of this version is 
[6ab75ae | 
https://github.com/apache/incubator-ratis/commit/6ab75ae9da4b380056279b02f208dc9b1329325b]

bq. Could you see if the deadlock can be reproducible from trunk?

I think it's hard to reproduce, I only met once. But I think the problem still 
exist, because the version:0.6.0-6ab75ae-SNAPSHOT is very new.

Besides, I also have analysis the dead lock.
Along the stack of SegmentedRaftLogWorker, the only opportunity to hold the 
lock of DataBlockingQueue is at 
[queue.poll|https://github.com/apache/incubator-ratis/blob/6add5871b6de8064d5340a5b0fcdf5dac12a6dd4/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L293],
 so maybe at here SegmentedRaftLogWorker got the lock but did not free it if 
the jstack report a correct information. And the only clue is the jstack 
information, log has been rolled. 

> Fix deadlock between grpc-default-executor and SegmentedRaftLogWorker
> ---------------------------------------------------------------------
>
>                 Key: RATIS-1004
>                 URL: https://issues.apache.org/jira/browse/RATIS-1004
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>         Attachments: jstack-deadlock-2.txt, screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to