Hi Jan,

Thank you for the clarification and knowledge sharing.

A follow-up question is:

Does Cassandra need to read all sstables for customer_id = 1L if my query
is:

select view_id from customer_view where customer_id = 1L limit 1

Since I have the date_day as the clustering key and it is sorted by
descending order. I'm assuming that the above query will return the latest
view_id for customer_id 1L.

Since I'm using TWCS, does Cassandra is smart enough to just query the
latest sstable that matches the partition key (customer_id = 1L) or it has
to go through the entire merge process?

Thank you,

Jerry


On Fri, Apr 7, 2017 at 2:08 AM, <j.kes...@enercast.de> wrote:

> Hi Jerry,
>
>
>
> the compaction strategy just tells Cassandra how to compact your sstables
> and with TWCS when to stop compacting further. But of course your data can
> and most likely will live in multiple sstables.
>
>
>
> The magic that happens is the the coordinator node for your request will
> merge the data for you on the fly. It is an easy job, as your data per
> sstable is already sorted.
>
>
>
> But be careful, if you end up with a worst case. If a customer_id is
> insertet every hour you can end up with reading many sstables decreasing
> read performance if the data should be kept a year or so.
>
>
>
> Jan
>
>
>
> Gesendet von meinem Windows 10 Phone
>
>
>
> *Von: *Jerry Lam <chiling...@gmail.com>
> *Gesendet: *Freitag, 7. April 2017 00:30
> *An: *user@cassandra.apache.org
> *Betreff: *How does clustering key works with
> TimeWindowCompactionStrategy (TWCS)
>
>
>
> Hi guys,
>
>
>
> I'm a new and happy user of Cassandra. We are using Cassandra for time
> series data so we choose TWCS because of its predictability and its ease of
> configuration.
>
>
>
> My question is we have a table with the following schema:
>
>
>
> CREATE TABLE IF NOT EXISTS customer_view (
>
> customer_id bigint,
>
> date_day Timestamp,
>
> view_id bigint,
>
> PRIMARY KEY (customer_id, date_day)
>
> ) WITH CLUSTERING ORDER BY (date_day DESC)
>
>
>
> What I understand is that the data will be order by date_day within the
> partition using the clustering key. However, the same customer_id can be
> inserted to this partition several times during the day and the TWCS says
> it will only compact the sstables within the window interval set in the
> configuration (in our case is 1 hour).
>
>
>
> How does Cassandra guarantee the clustering key order when the same
> customer_id appears in several sstables? Does it need to do a merge and
> then sort to find out the latest view_id for the customer_id? Or there are
> some magics happen behind the book can tell?
>
>
>
> Best Regards,
>
>
>
> Jerry
>
>
>

Reply via email to