Regarding memory usage after a repair ... Are the merkle trees kept around? On Oct 23, 2012 3:00 PM, "Bryan Talbot" <btal...@aeriagames.com> wrote:
> On Mon, Oct 22, 2012 at 6:05 PM, aaron morton <aa...@thelastpickle.com>wrote: > >> The GC was on-going even when the nodes were not compacting or running a >> heavy application load -- even when the main app was paused constant the GC >> continued. >> >> If you restart a node is the onset of GC activity correlated to some >> event? >> > > Yes and no. When the nodes were generally under the > .75 occupancy threshold a weekly "repair -pr" job would cause them to go > over the threshold and then stay there even after the repair had completed > and there were no ongoing compactions. It acts as though at least some > substantial amount of memory used during repair was never dereferenced once > the repair was complete. > > Once one CF in particular grew larger the constant GC would start up > pretty soon (less than 90 minutes) after a node restart even without a > repair. > > > > >> >> >> As a test we dropped the largest CF and the memory >> usage immediately dropped to acceptable levels and the constant GC stopped. >> So it's definitely related to data load. memtable size is 1 GB, row cache >> is disabled and key cache is small (default). >> >> How many keys did the CF have per node? >> I dismissed the memory used to hold bloom filters and index sampling. >> That memory is not considered part of the memtable size, and will end up in >> the tenured heap. It is generally only a problem with very large key counts >> per node. >> >> > I've changed the app to retain less data for that CF but I think that it > was about 400M rows per node. Row keys are a TimeUUID. All of the rows > are write-once, never updated, and rarely read. There are no secondary > indexes for this particular CF. > > > > >> They were 2+ GB (as reported by nodetool cfstats anyway). It looks like >> the default bloom_filter_fp_chance defaults to 0.0 >> >> The default should be 0.000744. >> >> If the chance is zero or null this code should run when a new SSTable is >> written >> // paranoia -- we've had bugs in the thrift <-> avro <-> CfDef dance >> before, let's not let that break things >> logger.error("Bloom filter FP chance of zero isn't >> supposed to happen"); >> >> Were the CF's migrated from an old version ? >> >> > Yes, the CF were created in 1.0.9, then migrated to 1.0.11 and finally to > 1.1.5 with a "upgradesstables" run at each upgrade along the way. > > I could not find a way to view the current bloom_filter_fp_chance settings > when they are at a default value. JMX reports the actual fp rate and if a > specific rate is set for a CF that shows up in "describe table" but I > couldn't find out how to tell what the default was. I didn't inspect the > source. > > > >> Is there any way to predict how much memory the bloom filters will >> consume if the size of the row keys, number or rows is known, and fp chance >> is known? >> >> >> See o.a.c.utils.BloomFilter.getFilter() in the code >> This http://hur.st/bloomfilter appears to give similar results. >> >> >> > > Ahh, very helpful. This indicates that 714MB would be used for the bloom > filter for that one CF. > > JMX / cfstats reports "Bloom Filter Space Used" but the MBean method name > (getBloomFilterDiskSpaceUsed) indicates this is the on-disk space. If > on-disk and in-memory space used is similar then summing up all the "Bloom > Filter Space Used" says they're currently consuming 1-2 GB of the heap > which is substantial. > > If a CF is rarely read is it safe to set bloom_filter_fp_chance to 1.0? > It just means more trips to SSTable indexes for a read correct? Trade RAM > for time (disk I/O). > > -Bryan > >