[jira] [Commented] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

Ryan McGuire (JIRA) Wed, 23 Jul 2014 14:32:59 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072373#comment-14072373
 ]


Ryan McGuire commented on CASSANDRA-7567:
-----------------------------------------

I don't know what it is about 12s specifically, but I'm seeing that same 
intermittent 12s latency spike on a different cluster, this time 3 c3.8xlarge. 

How long did you run the test? It definitely gets progressively worse the 
longer it runs. This kind of activity loops:

{code}
20352718  ,      25,      25,      25,  4029.3,     1.4, 12089.2, 12091.3, 
12092.0, 12092.0,  333.0,  0.01946
20639361  ,   24827,   24827,   24827,     4.0,     0.9,     2.3,     6.7,   
143.2, 10555.2,  344.5,  0.01964
20656481  ,   17580,   17580,   17580,     4.8,     1.4,    14.4,    56.1,   
198.3,   206.8,  345.5,  0.01978
20672190  ,   14344,   14344,   14344,     6.6,     0.9,    15.6,    42.0,   
320.1,   328.2,  346.6,  0.01997
20682762  ,   16666,   16666,   16666,     7.8,     0.8,    16.0,    97.3,   
444.3,   447.5,  347.3,  0.02018
20683062  ,      25,      25,      25,  4017.3,     1.0, 12054.9, 12056.2, 
12057.6, 12057.6,  359.3,  0.02037
21256796  ,   43999,   43999,   43999,     2.3,     0.9,     1.4,     3.5,    
80.8, 12100.9,  372.3,  0.02049
21356015  ,   90655,   90655,   90655,     1.1,     0.9,     1.7,     3.2,    
80.1,    86.7,  373.4,  0.02051
21426407  ,   69401,   69401,   69401,     1.4,     0.9,     2.0,     2.7,   
111.2,   114.4,  374.4,  0.02044
21500494  ,   73883,   73883,   73883,     1.3,     1.0,     2.1,     9.6,    
21.0,    23.1,  375.4,  0.02036
21517270  ,   16108,   16108,   16108,     6.1,     0.9,    16.4,    79.7,   
135.1,   136.2,  376.5,  0.02029
21523818  ,    4002,    4002,    4002,    22.0,     1.4,    84.4,   493.9,   
612.6,   672.2,  378.1,  0.02046
21524118  ,      25,      25,      25,  4024.3,     0.8, 12081.4, 12086.2, 
12097.4, 12097.4,  390.2,  0.02074
21975735  ,   35025,   35025,   35025,     2.9,     0.8,     1.1,     1.7,    
90.4, 12102.2,  403.1,  0.02104
22064231  ,   87534,   87534,   87534,     1.1,     0.9,     1.3,     2.3,   
100.4,   102.8,  404.1,  0.02109
22166146  ,  105540,  105540,  105540,     0.9,     0.9,     1.2,     2.1,     
5.1,    15.3,  405.1,  0.02102
22244592  ,   72442,   72442,   72442,     1.4,     0.9,     1.9,     3.6,    
98.0,   100.8,  406.1,  0.02096
22313323  ,   67902,   67902,   67902,     1.5,     0.9,     2.0,     3.3,   
213.0,   223.1,  407.2,  0.02088
22332959  ,   19299,   19299,   19299,     5.1,     0.9,    15.5,    21.5,   
119.6,   122.1,  408.2,  0.02081
22342749  ,    9854,    9854,    9854,     6.3,     0.9,    14.9,    23.6,   
547.6,   566.8,  409.2,  0.02094
22347270  ,    5399,    5399,    5399,    22.4,     1.6,    22.5,   759.5,   
816.5,   830.5,  410.0,  0.02115
22347570  ,      25,      25,      25,  4020.3,     0.8, 12083.5, 12089.3, 
12099.5, 12099.5,  422.1,  0.02139
22697985  ,   26963,   26963,   26963,     3.8,     0.8,     1.1,     2.1,    
65.5, 12100.1,  435.1,  0.02166
22788909  ,   90108,   90108,   90108,     1.1,     0.9,     1.3,     3.2,    
96.2,    99.2,  436.1,  0.02174
22885016  ,   95809,   95809,   95809,     1.0,     0.9,     1.3,     2.2,    
78.5,    82.4,  437.1,  0.02167
22975898  ,   81816,   81816,   81816,     1.2,     0.9,     1.6,     3.5,   
100.1,   102.3,  438.2,  0.02161
23049501  ,   72615,   72615,   72615,     1.4,     1.0,     1.9,     2.5,   
147.2,   151.1,  439.2,  0.02153
23074719  ,   24475,   24475,   24475,     4.0,     0.8,    14.4,    19.6,    
34.9,    46.3,  440.3,  0.02145
23094356  ,   18263,   18263,   18263,     5.5,     1.6,    15.0,    54.3,    
99.6,   106.9,  441.3,  0.02155
23103664  ,    9878,    9878,    9878,     6.5,     1.4,    16.1,   122.5,   
142.2,   349.2,  442.3,  0.02168
23104059  ,      33,      33,      33,  3083.8,     0.9, 12068.0, 12071.5, 
12071.6, 12071.6,  454.4,  0.02187
23378761  ,   21693,   21693,   21693,     4.7,     0.8,     1.5,     3.2,    
69.5, 12101.6,  467.1,  0.02212
23473613  ,   88575,   88575,   88575,     1.1,     0.9,     1.2,     3.0,    
95.9,    99.0,  468.1,  0.02222
23552129  ,   77476,   77476,   77476,     1.3,     0.9,     1.4,     3.3,   
159.1,   168.8,  469.1,  0.02215
23639310  ,   86154,   86154,   86154,     1.1,     0.9,     1.3,     3.1,    
95.5,    98.4,  470.1,  0.02207
23731401  ,   92987,   92987,   92987,     1.1,     0.9,     1.3,     2.9,    
59.4,    62.7,  471.1,  0.02200
23804418  ,   63119,   63119,   63119,     1.6,     1.0,     1.9,    11.9,   
147.2,   149.7,  472.3,  0.02193
23825655  ,   20577,   20577,   20577,     4.9,     0.8,    16.1,    22.2,    
37.6,    44.7,  473.3,  0.02186
23845878  ,   19932,   19932,   19932,     5.0,     1.2,    14.8,    23.4,    
79.5,    87.0,  474.3,  0.02197
23854629  ,   16059,   16059,   16059,     5.7,     2.0,    13.0,    98.1,   
114.8,   115.3,  474.9,  0.02207
23854929  ,      25,      25,      25,  4020.7,     0.8, 12089.1, 12101.0, 
12101.4, 12101.4,  487.0,  0.02219

{code}

> when the commit_log disk for a single node is overwhelmed the entire cluster 
> slows down
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7567
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
> commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
>            Reporter: David O'Dell
>            Assignee: Brandon Williams
>         Attachments: 7567.logs.bz2, write_request_latency.png
>
>
> We've run into a situation where a single node out of 14 is experiencing high 
> disk io. This can happen when a node is being decommissioned or after it 
> joins the ring and runs into the bug cassandra-6621.
> When this occurs the write latency for the entire cluster spikes.
> From 0.3ms to 170ms.
> To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
> of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
> have slowed down.
> BTW overwhelming the data disk does not have this same effect.
> Also I've tried this where the overwhelmed node isn't being connected 
> directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

Reply via email to