I think the key thing to remember is that compaction is performed on
*similar* sized sstables. so it makes sense that over time this will have a
cascading effect. I think by default it starts out with compacting 4
flushed sstables, then the cycle begins.
On Apr 4, 2011 3:42pm, shimi
cleanup reads each SSTable on disk and writes a new file that contains the same
data with the exception of rows that are no longer in a token range the node is
a replica for. It's not compacting the files into fewer files or purging
tombstones. But it is re-writing all the data for the CF.
hi Aaron -
The Datastax documentation brought to light the fact that over time, major
compactions will be performed on bigger and bigger SSTables. They actually
recommend against performing too many major compactions. Which is why I am
wary to trigger too many major compactions ...
mmm, interesting. My theory was
t0 - major compaction runs, there is now one sstable
t1 - x new sstables have been created
t2 - minor compaction runs and determines there are two buckets, one with the x
new sstables and one with the single big file. The bucket of many files is
compacted
The bigger the file the longer it will take for it to be part of a
compaction again.
Compacting bucket of large files takes longer then compacting bucket of
small files
Shimi
On Mon, Apr 4, 2011 at 3:58 PM, aaron morton aa...@thelastpickle.comwrote:
mmm, interesting. My theory was
t0 -
I discovered that a Garbage collection cleans up the unused old SSTables. But
I still wonder whether cleanup really does a full compaction. This would be
undesirable if so.
On Apr 1, 2011, at 4:08 PM, Jonathan Colby wrote:
I ran node cleanup on a node in my cluster and discovered the disk