> I will investigate situation more closely using gc via jconsole, but isn't
> bloom filter for new sstable entirely in memory? On disk there are only 2
> files Index and Data.
> -rw-r--r--  1 root  wheel   1388969984 Dec 27 09:25
> sipdb-tmp-hc-4634-Index.db
> -rw-r--r--  1 root  wheel  10965221376 Dec 27 09:25
> sipdb-tmp-hc-4634-Data.db
>
> Bloom filter can be that big. I have experience that if i trigger major
> compaction on 180 GB CF ( Compacted row mean size: 130) it will OOM node
> after 10 seconds, so i am sure that compactions eats memory pretty well.

Yes you're right, you'll definitely spike in memory usage whatever
amount corresponds to index sampling/BF for the thing being compacted.
This can be mitigated by never running full compactions (i.e., not
running 'nodetool compact'), but won't be gone all together.

Also, if your version doesn't yet have
https://issues.apache.org/jira/browse/CASSANDRA-2466 applied, another
side-effect is that the sudden large allocations for bloom filters can
cause promotion failures even if there is free memory.

> yes, it prints messages like heap is almost full and after some time it
> usually OOM during large compaction.

Ok, in that case it seems even more clear that you simply need a
larger heap. How large is the bloom filters in total? I.e., sizes of
the *-Filter.db files. In general, don't expect to be able to run at
close to heap capacity; there *will* be spikes.

In this particular case, leveled compaction in 1.0 should mitigate the
effect quite significantly since it only compacts up to 10% of the
data set at a time so memory usage should be considerably more even
(as will disk space usage be). That would allow you to run a bit
closer to heap capacity than regular compaction.

Also, consider tweaking compaction throughput settings to control the
rate of allocation generated during a compaction, even if you don't
need it for disk I/O purposes.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Reply via email to