A few comments on building a time-series store in Cassandra... Using the timestamp dimension of columns, "reusing" columns, could prove quite useful. This allows simple use of batch_mutate deletes (new in 0.6) to purge old data outside the active time window.
Otherwise, performance wise, deletes and "updates" are the same in Cassandra (see http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html). Data should be spread out over the ring, so load distribution is constant regardless of time or "burst peaks". A separate location cache, using a counting/timestamped bloom filter might be useful too, depending on your app, data structures, and throughput requirements. This should be kept outside cassandra and in RAM (redis or even memcache would fit nicely, but a simple RPC service would be faster). Something like such would allow you to build a tuned sliding-window type cache to ensure reads are minimized. Rinse, refactor, repeat, until fast enough and/or job is done ... > - Can we keep this "data window" approach, or will a high rate of > delete pose a problem? Delete and "insert" are both mutations, so if you can do one, you can do the other in ~ the same time. IOW, your rate of mutations in a one-in-one-out scenario is simply 2 * insert-rate. Due to the nature of deletes, you need to plan for storing "deleted" data until compaction though. The compaction phase itself will probably need accounting for, but that too is predictable. > - We need read speed, I understand writes won't be a problem, but > there will be a lot of reads, some of them with large sets of values. > - What role plays RAM in Cassandra under this scenario? 0.6 has improved caching for reads, but if your app truly needs high performance reads, some kind of application-tuned cache frontend (as mentioned above) is not a bad thing. For sliding-window time series, it's hard to beat a simple bloom-filter based cache without reaching for complexity. > Of course we are looking at Cassandra as a possible solution > and/or part of the solution, against / or combined with a in memory > DB. It's certainly possible to decouple purging from insertion in Cassandra, but there's no generic "this is how you do it" answer. This, IMHO, is a good thing though. /d