The problem here is the size and scope of the data—it’s basically a primary key based on the ID and the date, and there are several large pieces of information associated with it. The main issues with the various key/value stores are a) the inability to do range queries, and b) the size limitations on values. Right now, we’re hovering around 1TB a day for data in terms of Cassandra’s load, so that includes compression.
What kind of overhead should I expect for compaction, in terms of size? In this use case, the primary use for compaction is more or less to clean up tombstones for expired TTLs. Thanks, Andrew On Mon, Jul 7, 2014 at 3:45 PM, Robert Coli <rc...@eventbrite.com> wrote: On Mon, Jul 7, 2014 at 9:52 AM, Redmumba <redmu...@gmail.com> wrote: Would adjusting the maximum sstables before a compaction is performed help this situation? I am currently using the default values provided by SizeTieredCompactionStrategy in C* 2.0.6. Or is there a better option for a continuous-write operation (with TTLs for dropping off old data)? (Sorry, just saw this line about the workload.) Redis? If you have a high write rate and are discarding 100% of data after a TTL, perhaps a data-store with immutable data files that reconciles row fragments on read is not ideal for your use case? https://issues.apache.org/jira/browse/CASSANDRA-6654 Is probably also worth a read... =Rob