[ 
https://issues.apache.org/jira/browse/CASSANDRA-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615235#comment-14615235
 ] 

Constance Eustace commented on CASSANDRA-9640:
----------------------------------------------

It reappeared in QA and PROD this weekend, but we had to get the clusters 
restarted so we lost the logs again since QS had some critical testing....

You can see three Old Gen GCs in a single minute during 14:47:XX,XXX, but the 
Old Gen doesn't drop down appreciably. 
One of our nodes was down as well, so perhaps hinted handoff or other similar 
mechanisms were full...

Anyway we can probably kick off a nodetool repair in QA overnight at some point 
and watch the progress of the repair and the heap. 

I'm trying to build something that can allow me to execute monitoring and 
instruction commands using jsch from a coordinating machine.

INFO  [Service Thread] 2015-07-06 14:43:29,802 GCInspector.java:142 - G1 Old 
Generation GC in 16624ms.  G1 Old Gen: 8538649520 -> 8311333584; 
INFO  [Service Thread] 2015-07-06 14:43:47,055 GCInspector.java:142 - G1 Old 
Generation GC in 16651ms.  G1 Old Gen: 8537826000 -> 8310452944; 
INFO  [Service Thread] 2015-07-06 14:44:04,061 GCInspector.java:142 - G1 Old 
Generation GC in 16606ms.  G1 Old Gen: 8536945360 -> 8308287568; 
INFO  [Service Thread] 2015-07-06 14:44:55,720 GCInspector.java:142 - G1 Old 
Generation GC in 16633ms.  G1 Old Gen: 8534779984 -> 8307813504; 
INFO  [Service Thread] 2015-07-06 14:45:30,463 GCInspector.java:142 - G1 Old 
Generation GC in 16646ms.  G1 Old Gen: 8534305920 -> 8311749504; 
INFO  [Service Thread] 2015-07-06 14:46:06,510 GCInspector.java:142 - G1 Young 
Generation GC in 212ms.  G1 Eden Space: 222298112 -> 0; G1 Old Gen: 8311749504 
-> 8534047616; 
INFO  [Service Thread] 2015-07-06 14:46:06,521 GCInspector.java:142 - G1 Old 
Generation GC in 17939ms.  G1 Old Gen: 8534047616 -> 8319945712; 
INFO  [Service Thread] 2015-07-06 14:46:37,671 GCInspector.java:142 - G1 Old 
Generation GC in 16762ms.  G1 Old Gen: 8533855216 -> 8323723456; 
INFO  [Service Thread] 2015-07-06 14:46:53,667 GCInspector.java:142 - G1 Young 
Generation GC in 206ms.  G1 Eden Space: 213909504 -> 0; G1 Old Gen: 8323723456 
-> 8529244352; 
INFO  [Service Thread] 2015-07-06 14:47:10,459 GCInspector.java:142 - G1 Old 
Generation GC in 16679ms.  G1 Old Gen: 8537632960 -> 8330301992; 
INFO  [Service Thread] 2015-07-06 14:47:29,035 GCInspector.java:142 - G1 Young 
Generation GC in 224ms.  G1 Eden Space: 205520896 -> 0; G1 Old Gen: 8330301992 
-> 8535822888; 
INFO  [Service Thread] 2015-07-06 14:47:29,042 GCInspector.java:142 - G1 Old 
Generation GC in 17288ms.  G1 Old Gen: 8535822888 -> 8338334848; 
INFO  [Service Thread] 2015-07-06 14:47:47,727 GCInspector.java:142 - G1 Old 
Generation GC in 18089ms.  G1 Old Gen: 8535467136 -> 8338544752; 

> Nodetool repair of very wide, large rows causes GC pressure and 
> destabilization
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9640
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9640
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: AWS, ~8GB heap
>            Reporter: Constance Eustace
>            Priority: Minor
>             Fix For: 2.1.x
>
>         Attachments: syslog.zip
>
>
> We've noticed our nodes becoming unstable with large, unrecoverable Old Gen 
> GCs until OOM.
> This appears to be around the time of repair, and the specific cause seems to 
> be one of our report computation tables that involves possible very wide rows 
> with 10GB of data in it. THis is an RF 3 table in a four-node cluster.
> We truncate this occasionally, and we also had disabled this computation 
> report for a bit and noticed better node stabiliy.
> I wish I had more specifics. We are switching to an RF 1 table and do more 
> proactive truncation of the table. 
> When things calm down, we will attempt to replicate the issue and watch GC 
> and other logs.
> Any suggestion for things to look for/enable tracing on would be welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to