Alan Boudreault created CASSANDRA-9573:
------------------------------------------

             Summary: OOM when loading a compressed sstables (system.hints)
                 Key: CASSANDRA-9573
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9573
             Project: Cassandra
          Issue Type: Bug
            Reporter: Alan Boudreault
            Priority: Critical
             Fix For: 2.2.0 rc2
         Attachments: hs_err_pid11243.log, 
java-hints-issue-2015-06-09.snapshot, system.log, yourkit.ss.tar.gz

[~andrew.tolbert] discovered an issue while running endurance tests on 2.2. A 
Node was not able to start and was killed by the OOM Killer.

Briefly, Cassandra use an excessive amount of memory when loading compressed 
sstables (off-heap?). We have initially seen the issue with system.hints before 
knowing it was related to compression. system.hints use lz4 compression by 
default. If we have a sstable of, say 8-10G, Cassandra will be killed by the 
OOM killer after 1-2 minutes. I can reproduce that bug everytime locally. 

* the issue also happens if we have 10G of data splitted in 13M sstables.
* I can reproduce the issue if I put a lot of data in the system.hints table.
* I cannot reproduce the issue with a standard table using the same compression 
(LZ4). Something seems to be different when it's hints?

You wont see anything in the node system.log but you'll see this in 
/var/log/syslog.log:
{code}
Out of memory: Kill process 30777 (java) score 600 or sacrifice child
{code}

The issue has been introduced in this commit but is not related to the 
performance issue in CASSANDRA-9240: 
https://github.com/apache/cassandra/commit/aedce5fc6ba46ca734e91190cfaaeb23ba47a846

Here is the core dump and some yourkit snapshots in attachments. I am not sure 
you will be able to get useful information from them.
core dump: http://dl.alanb.ca/core.tar.gz

Not sure if this is related, but all dumps and snapshot points to 
EstimatedHistogramReservoir ... and we can see many 
javax.management.InstanceAlreadyExistsException: 
org.apache.cassandra.metrics:... exceptions in system.log before it hangs then 
crash.

//cc [~tjake] [~benedict] [~andrew.tolbert]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to