Dinesh Joshi updated CASSANDRA-15295:
    Status: Changes Suggested  (was: Review In Progress)

Hi [~gzh1992n], thanks for the patch. When I looked at the issue closely, the 
deadlock can be avoided if we don't start a {{Thread}} in a static initializer 
block and calling a method on an partially initialized object. This is a 
classic concurrency issue. Now, the way you have solved it is by moving the 
error handling to a different class but I think it is still needs to go a bit 
further. I have mocked up a very minimal change here: 
 It is the minimal set of changes required to avoid the deadlock and it also 
ensures that we operate on a fully initialized object. We can incorporate your 
refactor as well but I think it is important to get the correctness issue 
resolved first. It also requires a bit more guarding in {{CommitLog::start()}} 
so it's not started twice.

I have also not completed evaluating whether this change will cause any other 
issues as we are changing initialization behavior.

> Running into deadlock when do CommitLog initialization
> ------------------------------------------------------
>                 Key: CASSANDRA-15295
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15295
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Zephyr Guo
>            Assignee: Zephyr Guo
>            Priority: Normal
>         Attachments: jstack.log, pstack.log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
> Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a 
> long time.
>  I used jstack to saw what happened. The main thread stuck in 
> *AbstractCommitLogSegmentManager.awaitAvailableSegment*
>  !screenshot-1.png! 
> The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it 
> was not actually running.  
>  !screenshot-2.png! 
> And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on 
> java class initialization.
>   !screenshot-3.png! 
> This is a deadlock obviously. CommitLog waits for a CommitLogSegment when 
> initializing. In this moment, the CommitLog class is not initialized and the 
> main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a 
> CommitLogSegment with exception and call *CommitLog.handleCommitError*(static 
> method).  COMMIT-LOG-ALLOCATOR will block on this line because CommitLog 
> class is still initializing.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to