Hey Jerry - very happy to hear the post answered your questions. Alex wrote another great post on TWCS you might find useful, since you're using it: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
On Fri, Apr 7, 2017 at 8:20 AM Jerry Lam <chiling...@gmail.com> wrote: > Hi Jon, > > This Cassandra community is very helpful!!! Thanks for sharing this > blogpost with me. It answers all my questions related to TWCS with > clustering key and limit clause! > > Best Regards, > > Jerry > > > > On Fri, Apr 7, 2017 at 10:30 AM, Jon Haddad <jonathan.had...@gmail.com> > wrote: > > Alex Dejanovski wrote a good post on how the LIMIT clause works and why it > doesn’t (until 3.4) work the way you think it would. > > > http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html > > On Apr 7, 2017, at 7:23 AM, Jerry Lam <chiling...@gmail.com> wrote: > > Hi Jan, > > Thank you for the clarification and knowledge sharing. > > A follow-up question is: > > Does Cassandra need to read all sstables for customer_id = 1L if my query > is: > > select view_id from customer_view where customer_id = 1L limit 1 > > Since I have the date_day as the clustering key and it is sorted by > descending order. I'm assuming that the above query will return the latest > view_id for customer_id 1L. > > Since I'm using TWCS, does Cassandra is smart enough to just query the > latest sstable that matches the partition key (customer_id = 1L) or it has > to go through the entire merge process? > > Thank you, > > Jerry > > > On Fri, Apr 7, 2017 at 2:08 AM, <j.kes...@enercast.de> wrote: > > Hi Jerry, > > > > the compaction strategy just tells Cassandra how to compact your sstables > and with TWCS when to stop compacting further. But of course your data can > and most likely will live in multiple sstables. > > > > The magic that happens is the the coordinator node for your request will > merge the data for you on the fly. It is an easy job, as your data per > sstable is already sorted. > > > > But be careful, if you end up with a worst case. If a customer_id is > insertet every hour you can end up with reading many sstables decreasing > read performance if the data should be kept a year or so. > > > > Jan > > > > Gesendet von meinem Windows 10 Phone > > > > *Von: *Jerry Lam <chiling...@gmail.com> > *Gesendet: *Freitag, 7. April 2017 00:30 > *An: *user@cassandra.apache.org > *Betreff: *How does clustering key works with > TimeWindowCompactionStrategy (TWCS) > > > > Hi guys, > > > > I'm a new and happy user of Cassandra. We are using Cassandra for time > series data so we choose TWCS because of its predictability and its ease of > configuration. > > > > My question is we have a table with the following schema: > > > > CREATE TABLE IF NOT EXISTS customer_view ( > > customer_id bigint, > > date_day Timestamp, > > view_id bigint, > > PRIMARY KEY (customer_id, date_day) > > ) WITH CLUSTERING ORDER BY (date_day DESC) > > > > What I understand is that the data will be order by date_day within the > partition using the clustering key. However, the same customer_id can be > inserted to this partition several times during the day and the TWCS says > it will only compact the sstables within the window interval set in the > configuration (in our case is 1 hour). > > > > How does Cassandra guarantee the clustering key order when the same > customer_id appears in several sstables? Does it need to do a merge and > then sort to find out the latest view_id for the customer_id? Or there are > some magics happen behind the book can tell? > > > > Best Regards, > > > > Jerry > > > > > > >