Re: OOMs during high (read?) load in Cassandra 1.2.11
We're running largely default settings, with the exception of shard (1) and replica (0-n) counts and EC2-related snitch etc. No row caching at all. The logs never showed the same kind of entries pre-OOM, it basically occurred out of the blue. However, it seems that the problem has now subsided after forcing a compaction (using LeveledCompaction) that took several hours. Not sure if that's a permanent solution yet, but things are looking good so far. Klaus 2013/12/6 Vicky Kak vicky@gmail.com: I am not sure if you had got a chance to take a look at this http://www.datastax.com/docs/1.1/troubleshooting/index#oom http://www.datastax.com/docs/1.1/install/recommended_settings Can you attach the cassandra logs and the cassandra.yaml, it should be able to give us more details about the issue? Thanks, -VK
Re: OOMs during high (read?) load in Cassandra 1.2.11
2013/12/9 Nate McCall n...@thelastpickle.com: Do you have any secondary indexes defined in the schema? That could lead to a 'mega row' pretty easily depending on the cardinality of the value. That's an interesting point - but no, we don't have any secondary indexes anywhere. From the heap dump, it's fairly evident that it's not a single huge row but actually many rows. I'll keep watching if this occurs again, or if the compaction fixed it for good. Thanks, Klaus
OOMs during high (read?) load in Cassandra 1.2.11
We're getting fairly reproducible OOMs on a 2-node cluster using Cassandra 1.2.11, typically in situations with a heavy read load. A sample of some stack traces is at https://gist.github.com/KlausBrunner/7820902 - they're all failing somewhere down from table.getRow(), though I don't know if that's part of query processing, compaction, or something else. - The main CFs contain some 100k rows, none of them particularly wide. - Heap dumps invariably show a single huge byte array (~1.6 GiB associated with the OOM'ing thread) hogging 80% of the Java heap. The array seems to contain all/many rows of one CF. - We're moderately certain there's no killer query with a huge result set involved here, but we can't see exactly what triggers this. - We've tried to switch to LeveledCompaction, to no avail. - Xms/x is set to some 4 GB. - The logs show the usual signs of panic (flushing memtables) before actually OOMing. It seems that this scenario is often or even always after a compaction, but it's not quite conclusive. I'm somewhat worried that Cassandra would read so much data into a single contiguous byte[] at any point. Could this be related to compaction? Any ideas what we could do about this? Thanks Klaus