Re: Re: nodetool cleanup - results in more disk use?

2011-04-05 Thread jonathan . colby
I think the key thing to remember is that compaction is performed on *similar* sized sstables. so it makes sense that over time this will have a cascading effect. I think by default it starts out with compacting 4 flushed sstables, then the cycle begins. On Apr 4, 2011 3:42pm, shimi

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread aaron morton
cleanup reads each SSTable on disk and writes a new file that contains the same data with the exception of rows that are no longer in a token range the node is a replica for. It's not compacting the files into fewer files or purging tombstones. But it is re-writing all the data for the CF.

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread Jonathan Colby
hi Aaron - The Datastax documentation brought to light the fact that over time, major compactions will be performed on bigger and bigger SSTables. They actually recommend against performing too many major compactions. Which is why I am wary to trigger too many major compactions ...

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread aaron morton
mmm, interesting. My theory was t0 - major compaction runs, there is now one sstable t1 - x new sstables have been created t2 - minor compaction runs and determines there are two buckets, one with the x new sstables and one with the single big file. The bucket of many files is compacted

Re: nodetool cleanup - results in more disk use?

2011-04-04 Thread shimi
The bigger the file the longer it will take for it to be part of a compaction again. Compacting bucket of large files takes longer then compacting bucket of small files Shimi On Mon, Apr 4, 2011 at 3:58 PM, aaron morton aa...@thelastpickle.comwrote: mmm, interesting. My theory was t0 -

Re: nodetool cleanup - results in more disk use?

2011-04-01 Thread Jonathan Colby
I discovered that a Garbage collection cleans up the unused old SSTables. But I still wonder whether cleanup really does a full compaction. This would be undesirable if so. On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote: I ran node cleanup on a node in my cluster and discovered the disk