[ 
https://issues.apache.org/jira/browse/CASSANDRA-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Constance Eustace updated CASSANDRA-9640:
-----------------------------------------
    Description: 
UDPATE: The GC / Heap behavior seems most similar to CASSANDRA-9681  

... I suspect our repair was exacerbating / accelerating 9681's memory leak and 
that was leading to our issue. We will move to 2.1.8 ASAP and attempt nodetool 
repairs and watch the GC. 

--------------------------------------------------------------------

We've noticed our nodes becoming unstable with large, unrecoverable Old Gen GCs 
until OOM.

This appears to be around the time of repair, and the specific cause seems to 
be one of our report computation tables that involves possible very wide rows 
with 10GB of data in it. THis is an RF 3 table in a four-node cluster.

We truncate this occasionally, and we also had disabled this computation report 
for a bit and noticed better node stabiliy.

I wish I had more specifics. We are switching to an RF 1 table and do more 
proactive truncation of the table. 

When things calm down, we will attempt to replicate the issue and watch GC and 
other logs.

Any suggestion for things to look for/enable tracing on would be welcome.



  was:
We've noticed our nodes becoming unstable with large, unrecoverable Old Gen GCs 
until OOM.

This appears to be around the time of repair, and the specific cause seems to 
be one of our report computation tables that involves possible very wide rows 
with 10GB of data in it. THis is an RF 3 table in a four-node cluster.

We truncate this occasionally, and we also had disabled this computation report 
for a bit and noticed better node stabiliy.

I wish I had more specifics. We are switching to an RF 1 table and do more 
proactive truncation of the table. 

When things calm down, we will attempt to replicate the issue and watch GC and 
other logs.

Any suggestion for things to look for/enable tracing on would be welcome.



> Nodetool repair of very wide, large rows causes GC pressure and 
> destabilization
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9640
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9640
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: AWS, ~8GB heap
>            Reporter: Constance Eustace
>            Priority: Minor
>         Attachments: syslog.zip
>
>
> UDPATE: The GC / Heap behavior seems most similar to CASSANDRA-9681  
> ... I suspect our repair was exacerbating / accelerating 9681's memory leak 
> and that was leading to our issue. We will move to 2.1.8 ASAP and attempt 
> nodetool repairs and watch the GC. 
> --------------------------------------------------------------------
> We've noticed our nodes becoming unstable with large, unrecoverable Old Gen 
> GCs until OOM.
> This appears to be around the time of repair, and the specific cause seems to 
> be one of our report computation tables that involves possible very wide rows 
> with 10GB of data in it. THis is an RF 3 table in a four-node cluster.
> We truncate this occasionally, and we also had disabled this computation 
> report for a bit and noticed better node stabiliy.
> I wish I had more specifics. We are switching to an RF 1 table and do more 
> proactive truncation of the table. 
> When things calm down, we will attempt to replicate the issue and watch GC 
> and other logs.
> Any suggestion for things to look for/enable tracing on would be welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to