[ 
https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216783#comment-14216783
 ] 

Kishan Karunaratne commented on CASSANDRA-8285:
-----------------------------------------------

[~iamaleksey] I ran the duration tests for twice the normal length (6days/144h) 
and here are the results:

The duration tests run against 2.0.9 and 2.0.10 completed successfully without 
errors.

The endurance test run against 2.0.10 OOME'd. The endurance test is identical 
to the duration test, except that we randomly kill/restart C* nodes on a 
rolling basis. Although the C* nodes had a few a few heap dumps, it wasn't 
after the last one that the nodes weren't able to restart. This is when I 
noticed that there was an issue. I've attached the first heap dump and the last 
heap dump. Because of the way I was logging the gc, I only have the log for the 
immediately prior to the last heap dump.

The gc log, the heap dumps, the system logs, and a few graphs generated from 
the gc log via jClarity can be found here.

> OOME in Cassandra 2.0.11
> ------------------------
>
>                 Key: CASSANDRA-8285
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT
> Cassandra 2.0.11 + ruby-driver 1.0-beta
>            Reporter: Pierre Laporte
>            Assignee: Aleksey Yeschenko
>         Attachments: OOME_node_system.log, gc.log.gz, 
> heap-usage-after-gc-zoom.png, heap-usage-after-gc.png
>
>
> We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed 
> with an OOME.  This happened both with ruby-driver 1.0-beta and java-driver 
> 2.0.8-snapshot.
> Attached are :
> | OOME_node_system.log | The system.log of one Cassandra node that crashed |
> | gc.log.gz | The GC log on the same node |
> | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle 
> |
> | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong |
> Workload :
> Our test executes 5 CQL statements (select, insert, select, delete, select) 
> for a given unique id, during 3 days, using multiple threads.  There is not 
> change in the workload during the test.
> Symptoms :
> In the attached log, it seems something starts in Cassandra between 
> 2014-11-06 10:29:22 and 2014-11-06 10:45:32.  This causes an allocation that 
> fills the heap.  We eventually get stuck in a Full GC storm and get an OOME 
> in the logs.
> I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1.  The 
> error does not occur.  It seems specific to 2.0.11.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to