[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

Yuki Morishita (JIRA) Wed, 07 Oct 2015 16:26:05 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947793#comment-14947793
 ]


Yuki Morishita edited comment on CASSANDRA-10449 at 10/7/15 11:25 PM:
----------------------------------------------------------------------

There are couples of things going on.

{code}
ERROR [StreamReceiveTask:29] 2015-10-05 14:46:17,090 CassandraDaemon.java:223 - 
Exception in thread Thread[StreamReceiveTask:29,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
{code}

When rebuilding secondary index after receiving files, bootstrapping node is 
experiencing TombstoneOverwhelmingException.
This can make streaming to hang, as it never completes the receiving task.
Streaming should tolerate secondary index build failure, instead of failing 
entire stream session, it should just warn user and go on, so that user can 
manually trigger secondary index rebuild later.

I'm not sure the above relates to OOM. From StatusLogger, FlushWriter task is 
glowing and that is the cause of OOM.
-If you can capture stack using jstack, that would be greate help.- Missed 
attachment, sorry.

-I create separate JIRA for the former.- Created CASSANDRA-10474.


was (Author: yukim):
There are couples of things going on.

{code}
ERROR [StreamReceiveTask:29] 2015-10-05 14:46:17,090 CassandraDaemon.java:223 - 
Exception in thread Thread[StreamReceiveTask:29,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
{code}

When rebuilding secondary index after receiving files, bootstrapping node is 
experiencing TombstoneOverwhelmingException.
This can make streaming to hang, as it never completes the receiving task.
Streaming should tolerate secondary index build failure, instead of failing 
entire stream session, it should just warn user and go on, so that user can 
manually trigger secondary index rebuild later.

I'm not sure the above relates to OOM. From StatusLogger, FlushWriter task is 
glowing and that is the cause of OOM.
If you can capture stack using jstack, that would be greate help.

-I create separate JIRA for the former.- Created CASSANDRA-10474.

> OOM on bootstrap due to long GC pause
> -------------------------------------
>
>                 Key: CASSANDRA-10449
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 14.04, AWS
>            Reporter: Robbie Strickland
>              Labels: gc
>             Fix For: 2.1.x
>
>         Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

Reply via email to