Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

Jonathan Haddad Fri, 07 Apr 2017 08:27:53 -0700

Hey Jerry - very happy to hear the post answered your questions.  Alex
wrote another great post on TWCS you might find useful, since you're using
it: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html




On Fri, Apr 7, 2017 at 8:20 AM Jerry Lam <chiling...@gmail.com> wrote:

> Hi Jon,
>
> This Cassandra community is very helpful!!! Thanks for sharing this
> blogpost with me. It answers all my questions related to TWCS with
> clustering key and limit clause!
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Apr 7, 2017 at 10:30 AM, Jon Haddad <jonathan.had...@gmail.com>
> wrote:
>
> Alex Dejanovski wrote a good post on how the LIMIT clause works and why it
> doesn’t (until 3.4) work the way you think it would.
>
>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
>
> On Apr 7, 2017, at 7:23 AM, Jerry Lam <chiling...@gmail.com> wrote:
>
> Hi Jan,
>
> Thank you for the clarification and knowledge sharing.
>
> A follow-up question is:
>
> Does Cassandra need to read all sstables for customer_id = 1L if my query
> is:
>
> select view_id from customer_view where customer_id = 1L limit 1
>
> Since I have the date_day as the clustering key and it is sorted by
> descending order. I'm assuming that the above query will return the latest
> view_id for customer_id 1L.
>
> Since I'm using TWCS, does Cassandra is smart enough to just query the
> latest sstable that matches the partition key (customer_id = 1L) or it has
> to go through the entire merge process?
>
> Thank you,
>
> Jerry
>
>
> On Fri, Apr 7, 2017 at 2:08 AM, <j.kes...@enercast.de> wrote:
>
> Hi Jerry,
>
>
>
> the compaction strategy just tells Cassandra how to compact your sstables
> and with TWCS when to stop compacting further. But of course your data can
> and most likely will live in multiple sstables.
>
>
>
> The magic that happens is the the coordinator node for your request will
> merge the data for you on the fly. It is an easy job, as your data per
> sstable is already sorted.
>
>
>
> But be careful, if you end up with a worst case. If a customer_id is
> insertet every hour you can end up with reading many sstables decreasing
> read performance if the data should be kept a year or so.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Jerry Lam <chiling...@gmail.com>
> *Gesendet: *Freitag, 7. April 2017 00:30
> *An: *user@cassandra.apache.org
> *Betreff: *How does clustering key works with
> TimeWindowCompactionStrategy (TWCS)
>
>
>
> Hi guys,
>
>
>
> I'm a new and happy user of Cassandra. We are using Cassandra for time
> series data so we choose TWCS because of its predictability and its ease of
> configuration.
>
>
>
> My question is we have a table with the following schema:
>
>
>
> CREATE TABLE IF NOT EXISTS customer_view (
>
> customer_id bigint,
>
> date_day Timestamp,
>
> view_id bigint,
>
> PRIMARY KEY (customer_id, date_day)
>
> ) WITH CLUSTERING ORDER BY (date_day DESC)
>
>
>
> What I understand is that the data will be order by date_day within the
> partition using the clustering key. However, the same customer_id can be
> inserted to this partition several times during the day and the TWCS says
> it will only compact the sstables within the window interval set in the
> configuration (in our case is 1 hour).
>
>
>
> How does Cassandra guarantee the clustering key order when the same
> customer_id appears in several sstables? Does it need to do a merge and
> then sort to find out the latest view_id for the customer_id? Or there are
> some magics happen behind the book can tell?
>
>
>
> Best Regards,
>
>
>
> Jerry
>
>
>
>
>
>
>

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

Reply via email to