Re: OOMs during high (read?) load in Cassandra 1.2.11

2013-12-09 Thread Klaus Brunner
We're running largely default settings, with the exception of shard
(1) and replica (0-n) counts and EC2-related snitch etc. No row
caching at all. The logs never showed the same kind of entries
pre-OOM, it basically occurred out of the blue.

However, it seems that the problem has now subsided after forcing a
compaction (using LeveledCompaction) that took several hours. Not sure
if that's a permanent solution yet, but things are looking good so
far.

Klaus


2013/12/6 Vicky Kak vicky@gmail.com:
 I am not sure if you had got a chance to take a look at this
 http://www.datastax.com/docs/1.1/troubleshooting/index#oom
 http://www.datastax.com/docs/1.1/install/recommended_settings

 Can you attach the cassandra logs and the cassandra.yaml, it should be able
 to give us more details about the issue?

 Thanks,
 -VK



Re: OOMs during high (read?) load in Cassandra 1.2.11

2013-12-09 Thread Klaus Brunner
2013/12/9 Nate McCall n...@thelastpickle.com:
 Do you have any secondary indexes defined in the schema? That could lead to
 a 'mega row' pretty easily depending on the cardinality of the value.

That's an interesting point - but no, we don't have any secondary
indexes anywhere. From the heap dump, it's fairly evident that it's
not a single huge row but actually many rows.

I'll keep watching if this occurs again, or if the compaction fixed it for good.

Thanks,

Klaus


OOMs during high (read?) load in Cassandra 1.2.11

2013-12-06 Thread Klaus Brunner
We're getting fairly reproducible OOMs on a 2-node cluster using
Cassandra 1.2.11, typically in situations with a heavy read load. A
sample of some stack traces is at
https://gist.github.com/KlausBrunner/7820902 - they're all failing
somewhere down from table.getRow(), though I don't know if that's part
of query processing, compaction, or something else.

- The main CFs contain some 100k rows, none of them particularly wide.
- Heap dumps invariably show a single huge byte array (~1.6 GiB
associated with the OOM'ing thread) hogging  80% of the Java heap.
The array seems to contain all/many rows of one CF.
- We're moderately certain there's no killer query with a huge
result set involved here, but we can't see exactly what triggers this.
- We've tried to switch to LeveledCompaction, to no avail.
- Xms/x is set to some 4 GB.
- The logs show the usual signs of panic (flushing memtables) before
actually OOMing. It seems that this scenario is often or even always
after a compaction, but it's not quite conclusive.

I'm somewhat worried that Cassandra would read so much data into a
single contiguous byte[] at any point. Could this be related to
compaction? Any ideas what we could do about this?

Thanks

Klaus