[
https://issues.apache.org/jira/browse/CASSANDRA-15400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Petrov updated CASSANDRA-15400:
------------------------------------
Comment: was deleted
(was: I've noticed that the patch uses {{validateIfFixedSize}}. I intended to
fix it in some other patch, but wanted to let you know that
{{validateIfFixedSize}} is not implemented for {{ByteType}} and {{ShortType}}
even though they're fixed size.)
> Cassandra 3.0.18 went OOM several hours after joining a cluster
> ---------------------------------------------------------------
>
> Key: CASSANDRA-15400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15400
> Project: Cassandra
> Issue Type: Bug
> Components: Local/SSTable
> Reporter: Thomas Steinmaurer
> Assignee: Blake Eggleston
> Priority: Normal
> Fix For: 3.0.20, 3.11.6, 4.0
>
> Attachments: cassandra_hprof_bigtablereader_statsmetadata.png,
> cassandra_hprof_dominator_classes.png, cassandra_hprof_statsmetadata.png,
> cassandra_jvm_metrics.png, cassandra_operationcount.png,
> cassandra_sstables_pending_compactions.png, image.png
>
>
> We have been moving from Cassandra 2.1.18 to Cassandra 3.0.18 and have been
> facing an OOM two times with 3.0.18 on newly added nodes joining an existing
> cluster after several hours being successfully bootstrapped.
> Running in AWS:
> * m5.2xlarge, EBS SSD (gp2)
> * Xms/Xmx12G, Xmn3G, CMS GC, OpenJDK8u222
> * 4 compaction threads, throttling set to 32 MB/s
> What we see is a steady increase in the OLD gen over many hours.
> !cassandra_jvm_metrics.png!
> * The node started to join / auto-bootstrap the cluster on Oct 30 ~ 12:00
> * It basically finished joining the cluster (UJ => UN) ~ 19hrs later on Oct
> 31 ~ 07:00 also starting to be a member of serving client read requests
> !cassandra_operationcount.png!
> Memory-wise (on-heap) it didn't look that bad at that time, but old gen usage
> constantly increased.
> We see a correlation in increased number of SSTables and pending compactions.
> !cassandra_sstables_pending_compactions.png!
> Until we reached the OOM somewhere in Nov 1 in the night. After a Cassandra
> startup (metric gap in the chart above), number of SSTables + pending
> compactions is still high, but without facing memory troubles since then.
> This correlation is confirmed by the auto-generated heap dump with e.g. ~ 5K
> BigTableReader instances with ~ 8.7GByte retained heap in total.
> !cassandra_hprof_dominator_classes.png!
> Having a closer look on a single object instance, seems like each instance is
> ~ 2MByte in size.
> !cassandra_hprof_bigtablereader_statsmetadata.png!
> With 2 pre-allocated byte buffers (highlighted in the screen above) at 1
> MByte each
> We have been running with 2.1.18 for > 3 years and I can't remember dealing
> with such OOM in the context of extending a cluster.
> While the MAT screens above are from our production cluster, we partly can
> reproduce this behavior in our loadtest environment (although not going full
> OOM there), thus I might be able to share a hprof from this non-prod
> environment if needed.
> Thanks a lot.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]