[
https://issues.apache.org/jira/browse/CASSANDRA-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Westfalewicz updated CASSANDRA-10787:
-------------------------------------------
Attachment: case5_systemlog.txt
case5_debuglog.txt
Hey guys,
Here is the continuation of the story:
0. Taking your advice, I've decided to create more powerful cluster
1. I've created a new cluster, on 2x m1.xlarge instances (4 vCPU, 64-bit, 15GB
RAM, Raid0 4x420GB HDD Disk), and changed RF to 2
2. Took the snapshot of the data (keyspace.table=logs.group) on one of the old
nodes
3. scp between old node and the new node. From snapshot to the
cassandra/data/kayspace/tablename folder
4. Loaded the data without restarting the server - by nodetool refresh
5. Triggered nodetool repair
After few hours - the server went down. I've attached the logs. This is the
case 5 files. Maybe that's because of the size of sstables? In my case one of
them was around 50GB.
I've also migrated the rest of the data (not "big" logs.group, but another
tables from 500M to 5GB) the same way and the server was working fine.
> OutOfMemoryError after few hours from node restart
> --------------------------------------------------
>
> Key: CASSANDRA-10787
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10787
> Project: Cassandra
> Issue Type: Bug
> Environment: Amazon DataStax Auto-Clustering AMI 2.6.3-1404-pv
> on 2x m1.large instances (2 vCPU, 64-bit, 7.5GB RAM, Raid0 2x420GB Disk)
> [cqlsh 5.0.1 | Cassandra 2.2.3 | CQL spec 3.3.1 | Native protocol v4]
> RF=3
> Reporter: Piotr Westfalewicz
> Fix For: 2.2.x
>
> Attachments: case2_debuglog_head.txt, case2_debuglog_tail.txt,
> case2_systemlog.txt, case3_debuglog_tail.txt, case3_systemlog_tail.txt,
> case4_debuglog_tail.txt, case4_systemlog.txt, case5_debuglog.txt,
> case5_systemlog.txt
>
>
> Cassandra Cluster was operating flawessly for around 3 months. Lately I've
> got a critical problem with it - after few hours of running clients are
> disconnected permanently (that may be Datastax C# Driver problem, though),
> however few more hours later (with smaller load), on all 2 nodes there is
> thrown an exception (details in files):
> bq. java.lang.OutOfMemoryError: Java heap space
> Cases description:
> Case 2 (heavy load):
> - 2015-11-26 16:09:40,834 Restarted all nodes in cassandra cluster
> - 2015-11-26 17:03:46,774 First client disconnected permanently
> - 2015-11-26 22:17:02,327 Node shutdown
> Case 3 (unknown load, different node):
> - 2015-11-26 02:19:49,585 Node shutdown (visible only in
> systemlog, I don't know why not in debug log)
> Case 4 (low load):
> - 2015-11-27 13:00:24,994 Node restart
> - 2015-11-27 22:26:56,131 Node shutdown
> Is that a software issue or I am using too weak Amazon instances? If so, how
> can the required amount of memory be calculated?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)