simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
The way compaction works, x same-sized files are merged into a new SSTable. This repeats itself and the SSTable get bigger and bigger. So what is the upper limit?? If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely? I ask because we have some rather

Re: simple question about merged SSTable sizes

2011-06-22 Thread Eric tamme
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby jonathan.co...@gmail.com wrote: The way compaction works,  x same-sized files are merged into a new SSTable.  This repeats itself and the SSTable get bigger and bigger. So what is the upper limit??     If you are not deleting stuff fast

Re: simple question about merged SSTable sizes

2011-06-22 Thread Edward Capriolo
Yes, if you are not deleting fast enough they will grow. This is not specifically a cassandra problem /var/log/messages has the same issue. There is a JIRA ticket about having a maximum size for SSTables, so they always stay manageable You fall into a small trap when you force major compaction

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
Thanks for the explanation. I'm still a bit skeptical. So if you really needed to control the maximum size of compacted SSTables, you need to delete data at such a rate that the new files created by compaction are less than or equal to the sum of the segments being merged. Is anyone else

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
So the take-away is try to avoid major compactions at all costs! Thanks Ed and Eric. On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote: Yes, if you are not deleting fast enough they will grow. This is not specifically a cassandra problem /var/log/messages has the same issue. There is

Re: simple question about merged SSTable sizes

2011-06-22 Thread Ryan King
On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby jonathan.co...@gmail.com wrote: Thanks for the explanation.  I'm still a bit skeptical. So if you really needed to control the maximum size of compacted SSTables,   you need to delete data at such a rate that the new files created by

Re: simple question about merged SSTable sizes

2011-06-22 Thread Eric tamme
Second, compacting such large files is an IO killer.    What can be tuned other than compaction_threshold to help optimize this and prevent the files from getting too big? Thanks! Just a personal implementation note - I make heavy use of column TTL, so I have very specifically tuned

Re: simple question about merged SSTable sizes

2011-06-22 Thread Edward Capriolo
I would not say avoid major compactions at all cost. In the old days 0.6.5 IIRC the only way to clear tombstones was a major compaction. The nice thing about major compaction is if you have a situation with 4 SSTables at 2GB each (that is total 8GB). Under normal write conditions it could be

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
Thanks Ryan. Done that : ) 1 TB is the striped size.We might look into bigger disks for our blades. On Jun 22, 2011, at 7:09 PM, Ryan King wrote: On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby jonathan.co...@gmail.com wrote: Thanks for the explanation. I'm still a bit skeptical.

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
Awesome tip on TTL. We can really use this as a catch-all to make sure all columns are purged based on time. Fits our use-case good. I forgot this feature existed. On Jun 22, 2011, at 7:11 PM, Eric tamme wrote: Second, compacting such large files is an IO killer.What can be tuned