[
https://issues.apache.org/jira/browse/RATIS-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160931#comment-17160931
]
runzhiwang commented on RATIS-1004:
-----------------------------------
[~szetszwo] Thanks for review.
bq. Do you know which two locks are causing the deadlock?
grpc-default-executor-3449 thread hold the lock of
[RaftLog|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java#L83]
and wait the lock of
[DataBlockingQueue|https://github.com/apache/incubator-ratis/blob/master/ratis-common/src/main/java/org/apache/ratis/util/DataBlockingQueue.java#L96]
SegmentedRaftLogWorker thread hold the lock of
[DataBlockingQueue|https://github.com/apache/incubator-ratis/blob/master/ratis-common/src/main/java/org/apache/ratis/util/DataBlockingQueue.java#L45]
and wait the lock of
[RaftLog|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLog.java#L510]
bq. BTW, I have noticed that the line numbers shown in the stack traces are
different from trunk.
I use the code of version 0.6.0-6ab75ae-SNAPSHOT. The code of this version is
[6ab75ae |
https://github.com/apache/incubator-ratis/commit/6ab75ae9da4b380056279b02f208dc9b1329325b]
bq. Could you see if the deadlock can be reproducible from trunk?
I think it's hard to reproduce, I only met once. But I think the problem still
exist, because the version:0.6.0-6ab75ae-SNAPSHOT is very new.
Besides, I also have analysis the dead lock.
Along the stack of SegmentedRaftLogWorker, the only opportunity to hold the
lock of DataBlockingQueue is at
[queue.poll|https://github.com/apache/incubator-ratis/blob/6add5871b6de8064d5340a5b0fcdf5dac12a6dd4/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L293],
so maybe at here SegmentedRaftLogWorker got the lock but did not free it if
the jstack report a correct information. And the only clue is the jstack
information, log has been rolled.
> Fix deadlock between grpc-default-executor and SegmentedRaftLogWorker
> ---------------------------------------------------------------------
>
> Key: RATIS-1004
> URL: https://issues.apache.org/jira/browse/RATIS-1004
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: runzhiwang
> Assignee: runzhiwang
> Priority: Major
> Attachments: jstack-deadlock-2.txt, screenshot-1.png
>
>
> !screenshot-1.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)