Hi Jerry, the compaction strategy just tells Cassandra how to compact your sstables and with TWCS when to stop compacting further. But of course your data can and most likely will live in multiple sstables.
The magic that happens is the the coordinator node for your request will merge the data for you on the fly. It is an easy job, as your data per sstable is already sorted. But be careful, if you end up with a worst case. If a customer_id is insertet every hour you can end up with reading many sstables decreasing read performance if the data should be kept a year or so. Jan Gesendet von meinem Windows 10 Phone Von: Jerry Lam Gesendet: Freitag, 7. April 2017 00:30 An: user@cassandra.apache.org Betreff: How does clustering key works with TimeWindowCompactionStrategy (TWCS) Hi guys, I'm a new and happy user of Cassandra. We are using Cassandra for time series data so we choose TWCS because of its predictability and its ease of configuration. My question is we have a table with the following schema: CREATE TABLE IF NOT EXISTS customer_view ( customer_id bigint, date_day Timestamp, view_id bigint, PRIMARY KEY (customer_id, date_day) ) WITH CLUSTERING ORDER BY (date_day DESC) What I understand is that the data will be order by date_day within the partition using the clustering key. However, the same customer_id can be inserted to this partition several times during the day and the TWCS says it will only compact the sstables within the window interval set in the configuration (in our case is 1 hour). How does Cassandra guarantee the clustering key order when the same customer_id appears in several sstables? Does it need to do a merge and then sort to find out the latest view_id for the customer_id? Or there are some magics happen behind the book can tell? Best Regards, Jerry