Hi Jan, Thank you for the clarification and knowledge sharing.
A follow-up question is: Does Cassandra need to read all sstables for customer_id = 1L if my query is: select view_id from customer_view where customer_id = 1L limit 1 Since I have the date_day as the clustering key and it is sorted by descending order. I'm assuming that the above query will return the latest view_id for customer_id 1L. Since I'm using TWCS, does Cassandra is smart enough to just query the latest sstable that matches the partition key (customer_id = 1L) or it has to go through the entire merge process? Thank you, Jerry On Fri, Apr 7, 2017 at 2:08 AM, <j.kes...@enercast.de> wrote: > Hi Jerry, > > > > the compaction strategy just tells Cassandra how to compact your sstables > and with TWCS when to stop compacting further. But of course your data can > and most likely will live in multiple sstables. > > > > The magic that happens is the the coordinator node for your request will > merge the data for you on the fly. It is an easy job, as your data per > sstable is already sorted. > > > > But be careful, if you end up with a worst case. If a customer_id is > insertet every hour you can end up with reading many sstables decreasing > read performance if the data should be kept a year or so. > > > > Jan > > > > Gesendet von meinem Windows 10 Phone > > > > *Von: *Jerry Lam <chiling...@gmail.com> > *Gesendet: *Freitag, 7. April 2017 00:30 > *An: *user@cassandra.apache.org > *Betreff: *How does clustering key works with > TimeWindowCompactionStrategy (TWCS) > > > > Hi guys, > > > > I'm a new and happy user of Cassandra. We are using Cassandra for time > series data so we choose TWCS because of its predictability and its ease of > configuration. > > > > My question is we have a table with the following schema: > > > > CREATE TABLE IF NOT EXISTS customer_view ( > > customer_id bigint, > > date_day Timestamp, > > view_id bigint, > > PRIMARY KEY (customer_id, date_day) > > ) WITH CLUSTERING ORDER BY (date_day DESC) > > > > What I understand is that the data will be order by date_day within the > partition using the clustering key. However, the same customer_id can be > inserted to this partition several times during the day and the TWCS says > it will only compact the sstables within the window interval set in the > configuration (in our case is 1 hour). > > > > How does Cassandra guarantee the clustering key order when the same > customer_id appears in several sstables? Does it need to do a merge and > then sort to find out the latest view_id for the customer_id? Or there are > some magics happen behind the book can tell? > > > > Best Regards, > > > > Jerry > > >