[
https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dinesh Joshi updated CASSANDRA-15295:
-------------------------------------
Status: Changes Suggested (was: Review In Progress)
Hi [~gzh1992n], thanks for the patch. When I looked at the issue closely, the
deadlock can be avoided if we don't start a {{Thread}} in a static initializer
block and calling a method on an partially initialized object. This is a
classic concurrency issue. Now, the way you have solved it is by moving the
error handling to a different class but I think it is still needs to go a bit
further. I have mocked up a very minimal change here:
https://github.com/apache/cassandra/compare/trunk...dineshjoshi:15295-trunk?expand=1
It is the minimal set of changes required to avoid the deadlock and it also
ensures that we operate on a fully initialized object. We can incorporate your
refactor as well but I think it is important to get the correctness issue
resolved first. It also requires a bit more guarding in {{CommitLog::start()}}
so it's not started twice.
I have also not completed evaluating whether this change will cause any other
issues as we are changing initialization behavior.
> Running into deadlock when do CommitLog initialization
> ------------------------------------------------------
>
> Key: CASSANDRA-15295
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15295
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Commit Log
> Reporter: Zephyr Guo
> Assignee: Zephyr Guo
> Priority: Normal
> Attachments: jstack.log, pstack.log, screenshot-1.png,
> screenshot-2.png, screenshot-3.png
>
>
> Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a
> long time.
> I used jstack to saw what happened. The main thread stuck in
> *AbstractCommitLogSegmentManager.awaitAvailableSegment*
> !screenshot-1.png!
> The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it
> was not actually running.
> !screenshot-2.png!
> And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on
> java class initialization.
> !screenshot-3.png!
> This is a deadlock obviously. CommitLog waits for a CommitLogSegment when
> initializing. In this moment, the CommitLog class is not initialized and the
> main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a
> CommitLogSegment with exception and call *CommitLog.handleCommitError*(static
> method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog
> class is still initializing.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]