The problem here is the size and scope of the data—it’s basically a primary key 
based on the ID and the date, and there are several large pieces of information 
associated with it.  The main issues with the various key/value stores are a) 
the inability to do range queries, and b) the size limitations on values.  
Right now, we’re hovering around 1TB a day for data in terms of Cassandra’s 
load, so that includes compression.

What kind of overhead should I expect for compaction, in terms of size?  In 
this use case, the primary use for compaction is more or less to clean up 
tombstones for expired TTLs.

Thanks,

Andrew

On Mon, Jul 7, 2014 at 3:45 PM, Robert Coli <rc...@eventbrite.com> wrote:
On Mon, Jul 7, 2014 at 9:52 AM, Redmumba <redmu...@gmail.com> wrote:
Would adjusting the maximum sstables before a compaction is performed help this 
situation?  I am currently using the default values provided by 
SizeTieredCompactionStrategy in C* 2.0.6.  Or is there a better option for a 
continuous-write operation (with TTLs for dropping off old data)?

(Sorry, just saw this line about the workload.)

Redis? 

If you have a high write rate and are discarding 100% of data after a TTL, 
perhaps a data-store with immutable data files that reconciles row fragments on 
read is not ideal for your use case?

https://issues.apache.org/jira/browse/CASSANDRA-6654

Is probably also worth a read...

=Rob

Reply via email to