[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dinesh Joshi updated CASSANDRA-15295: ------------------------------------- Status: Changes Suggested (was: Review In Progress) Hi [~gzh1992n], thanks for the patch. When I looked at the issue closely, the deadlock can be avoided if we don't start a {{Thread}} in a static initializer block and calling a method on an partially initialized object. This is a classic concurrency issue. Now, the way you have solved it is by moving the error handling to a different class but I think it is still needs to go a bit further. I have mocked up a very minimal change here: https://github.com/apache/cassandra/compare/trunk...dineshjoshi:15295-trunk?expand=1 It is the minimal set of changes required to avoid the deadlock and it also ensures that we operate on a fully initialized object. We can incorporate your refactor as well but I think it is important to get the correctness issue resolved first. It also requires a bit more guarding in {{CommitLog::start()}} so it's not started twice. I have also not completed evaluating whether this change will cause any other issues as we are changing initialization behavior. > Running into deadlock when do CommitLog initialization > ------------------------------------------------------ > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log > Reporter: Zephyr Guo > Assignee: Zephyr Guo > Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !screenshot-1.png! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !screenshot-2.png! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !screenshot-3.png! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org