[ 
https://issues.apache.org/jira/browse/CASSANDRA-14253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Kirillov updated CASSANDRA-14253:
----------------------------------------
    Description: 
Hi.

I've had a persistent problems with cluster upgraded to 3.11.2. I was 
recreating one of my MVs and suddenly nodes request latencies went crazy.

During investigation I have found that half of my nodes had stuck MutationStage 
threads. All 64 threads were Active, Pending count was continuously increasing 
while Completed stuck on one value.

After restart nodes worked a few minutes and then stuck again. Another restart, 
another few minutes of work and stuck.

In attachment you can find stack dump (from sjk stcap) and flame graphs for 
MutationStage threads and for all threads. It seems that all MutationStage 
threads were waiting for some event.

 

Downgrade to 3.10 solved that problem, after downgrade all nodes are 
operational and not freezing.

 

  was:
Hi.

I've had a persistent problems with cluster upgraded to 3.11.2. I was 
recreating one of my MVs and suddenly nodes request latencies went crazy.

During investigation I have found that half of my nodes had stuck MutationStage 
threads. All 64 threads were Active, Pending count was continuously increasing 
while Completed stuck on one value.

After restart nodes worked a few minutes and then stuck again. Another restart, 
another few minutes of work and stuck.

In attachment you can find flame graphs for MutationStage threads and for all 
threads. It seems that all MutationStage threads were waiting for some event.

 

Downgrade to 3.10 solved that problem, after downgrade all nodes are 
operational and not freezing.

 


> MutationStage threads deadlock
> ------------------------------
>
>                 Key: CASSANDRA-14253
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14253
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: Ubuntu 16.04
>            Reporter: Sergey Kirillov
>            Priority: Major
>         Attachments: dump.std, flame.svg, flame_tn.svg
>
>
> Hi.
> I've had a persistent problems with cluster upgraded to 3.11.2. I was 
> recreating one of my MVs and suddenly nodes request latencies went crazy.
> During investigation I have found that half of my nodes had stuck 
> MutationStage threads. All 64 threads were Active, Pending count was 
> continuously increasing while Completed stuck on one value.
> After restart nodes worked a few minutes and then stuck again. Another 
> restart, another few minutes of work and stuck.
> In attachment you can find stack dump (from sjk stcap) and flame graphs for 
> MutationStage threads and for all threads. It seems that all MutationStage 
> threads were waiting for some event.
>  
> Downgrade to 3.10 solved that problem, after downgrade all nodes are 
> operational and not freezing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to