> Regarding memory usage after a repair ... Are the merkle trees kept around?
> 

They should not be.

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/10/2012, at 4:51 PM, B. Todd Burruss <bto...@gmail.com> wrote:

> Regarding memory usage after a repair ... Are the merkle trees kept around?
> 
> On Oct 23, 2012 3:00 PM, "Bryan Talbot" <btal...@aeriagames.com> wrote:
> On Mon, Oct 22, 2012 at 6:05 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> The GC was on-going even when the nodes were not compacting or running a 
>> heavy application load -- even when the main app was paused constant the GC 
>> continued.
> If you restart a node is the onset of GC activity correlated to some event?
> 
> Yes and no.  When the nodes were generally under the .75 occupancy threshold 
> a weekly "repair -pr" job would cause them to go over the threshold and then 
> stay there even after the repair had completed and there were no ongoing 
> compactions.  It acts as though at least some substantial amount of memory 
> used during repair was never dereferenced once the repair was complete.
> 
> Once one CF in particular grew larger the constant GC would start up pretty 
> soon (less than 90 minutes) after a node restart even without a repair.
> 
> 
>  
>  
>> As a test we dropped the largest CF and the memory usage immediately dropped 
>> to acceptable levels and the constant GC stopped.  So it's definitely 
>> related to data load.  memtable size is 1 GB, row cache is disabled and key 
>> cache is small (default).
> How many keys did the CF have per node? 
> I dismissed the memory used to  hold bloom filters and index sampling. That 
> memory is not considered part of the memtable size, and will end up in the 
> tenured heap. It is generally only a problem with very large key counts per 
> node. 
> 
> 
> I've changed the app to retain less data for that CF but I think that it was 
> about 400M rows per node.  Row keys are a TimeUUID.  All of the rows are 
> write-once, never updated, and rarely read.  There are no secondary indexes 
> for this particular CF.
> 
> 
>  
>>  They were 2+ GB (as reported by nodetool cfstats anyway).  It looks like 
>> the default bloom_filter_fp_chance defaults to 0.0 
> The default should be 0.000744.
> 
> If the chance is zero or null this code should run when a new SSTable is 
> written 
>   // paranoia -- we've had bugs in the thrift <-> avro <-> CfDef dance 
> before, let's not let that break things
>                 logger.error("Bloom filter FP chance of zero isn't supposed 
> to happen");
> 
> Were the CF's migrated from an old version ?
> 
> 
> Yes, the CF were created in 1.0.9, then migrated to 1.0.11 and finally to 
> 1.1.5 with a "upgradesstables" run at each upgrade along the way.
> 
> I could not find a way to view the current bloom_filter_fp_chance settings 
> when they are at a default value.  JMX reports the actual fp rate and if a 
> specific rate is set for a CF that shows up in "describe table" but I 
> couldn't find out how to tell what the default was.  I didn't inspect the 
> source.
> 
>  
>> Is there any way to predict how much memory the bloom filters will consume 
>> if the size of the row keys, number or rows is known, and fp chance is known?
> 
> See o.a.c.utils.BloomFilter.getFilter() in the code 
> This http://hur.st/bloomfilter appears to give similar results. 
> 
> 
> 
> 
> Ahh, very helpful.  This indicates that 714MB would be used for the bloom 
> filter for that one CF.
> 
> JMX / cfstats reports "Bloom Filter Space Used" but the MBean method name 
> (getBloomFilterDiskSpaceUsed) indicates this is the on-disk space. If on-disk 
> and in-memory space used is similar then summing up all the "Bloom Filter 
> Space Used" says they're currently consuming 1-2 GB of the heap which is 
> substantial.
> 
> If a CF is rarely read is it safe to set bloom_filter_fp_chance to 1.0?  It 
> just means more trips to SSTable indexes for a read correct?  Trade RAM for 
> time (disk I/O).
> 
> -Bryan
> 

Reply via email to