Hi Jerry,

the compaction strategy just tells Cassandra how to compact your sstables and 
with TWCS when to stop compacting further. But of course your data can and most 
likely will live in multiple sstables. 

The magic that happens is the the coordinator node for your request will merge 
the data for you on the fly. It is an easy job, as your data per sstable is 
already sorted.

But be careful, if you end up with a worst case. If a customer_id is insertet 
every hour you can end up with reading many sstables decreasing read 
performance if the data should be kept a year or so.

Jan

Gesendet von meinem Windows 10 Phone

Von: Jerry Lam
Gesendet: Freitag, 7. April 2017 00:30
An: user@cassandra.apache.org
Betreff: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

Hi guys,

I'm a new and happy user of Cassandra. We are using Cassandra for time series 
data so we choose TWCS because of its predictability and its ease of 
configuration.

My question is we have a table with the following schema:

CREATE TABLE IF NOT EXISTS customer_view (
customer_id bigint,
date_day Timestamp,
view_id bigint,
PRIMARY KEY (customer_id, date_day)
) WITH CLUSTERING ORDER BY (date_day DESC)

What I understand is that the data will be order by date_day within the 
partition using the clustering key. However, the same customer_id can be 
inserted to this partition several times during the day and the TWCS says it 
will only compact the sstables within the window interval set in the 
configuration (in our case is 1 hour). 

How does Cassandra guarantee the clustering key order when the same customer_id 
appears in several sstables? Does it need to do a merge and then sort to find 
out the latest view_id for the customer_id? Or there are some magics happen 
behind the book can tell?

Best Regards,

Jerry

Reply via email to