Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

Jerry Lam Fri, 07 Apr 2017 08:20:43 -0700

Hi Jon,

This Cassandra community is very helpful!!! Thanks for sharing this
blogpost with me. It answers all my questions related to TWCS with
clustering key and limit clause!


Best Regards,

Jerry



On Fri, Apr 7, 2017 at 10:30 AM, Jon Haddad <jonathan.had...@gmail.com>
wrote:

> Alex Dejanovski wrote a good post on how the LIMIT clause works and why it
> doesn’t (until 3.4) work the way you think it would.
>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-
> in-cassandra-might-not-work-as-you-think.html
>
> On Apr 7, 2017, at 7:23 AM, Jerry Lam <chiling...@gmail.com> wrote:
>
> Hi Jan,
>
> Thank you for the clarification and knowledge sharing.
>
> A follow-up question is:
>
> Does Cassandra need to read all sstables for customer_id = 1L if my query
> is:
>
> select view_id from customer_view where customer_id = 1L limit 1
>
> Since I have the date_day as the clustering key and it is sorted by
> descending order. I'm assuming that the above query will return the latest
> view_id for customer_id 1L.
>
> Since I'm using TWCS, does Cassandra is smart enough to just query the
> latest sstable that matches the partition key (customer_id = 1L) or it has
> to go through the entire merge process?
>
> Thank you,
>
> Jerry
>
>
> On Fri, Apr 7, 2017 at 2:08 AM, <j.kes...@enercast.de> wrote:
>
>> Hi Jerry,
>>
>>
>>
>> the compaction strategy just tells Cassandra how to compact your sstables
>> and with TWCS when to stop compacting further. But of course your data can
>> and most likely will live in multiple sstables.
>>
>>
>>
>> The magic that happens is the the coordinator node for your request will
>> merge the data for you on the fly. It is an easy job, as your data per
>> sstable is already sorted.
>>
>>
>>
>> But be careful, if you end up with a worst case. If a customer_id is
>> insertet every hour you can end up with reading many sstables decreasing
>> read performance if the data should be kept a year or so.
>>
>>
>>
>> Jan
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *Jerry Lam <chiling...@gmail.com>
>> *Gesendet: *Freitag, 7. April 2017 00:30
>> *An: *user@cassandra.apache.org
>> *Betreff: *How does clustering key works with
>> TimeWindowCompactionStrategy (TWCS)
>>
>>
>>
>> Hi guys,
>>
>>
>>
>> I'm a new and happy user of Cassandra. We are using Cassandra for time
>> series data so we choose TWCS because of its predictability and its ease of
>> configuration.
>>
>>
>>
>> My question is we have a table with the following schema:
>>
>>
>>
>> CREATE TABLE IF NOT EXISTS customer_view (
>>
>> customer_id bigint,
>>
>> date_day Timestamp,
>>
>> view_id bigint,
>>
>> PRIMARY KEY (customer_id, date_day)
>>
>> ) WITH CLUSTERING ORDER BY (date_day DESC)
>>
>>
>>
>> What I understand is that the data will be order by date_day within the
>> partition using the clustering key. However, the same customer_id can be
>> inserted to this partition several times during the day and the TWCS says
>> it will only compact the sstables within the window interval set in the
>> configuration (in our case is 1 hour).
>>
>>
>>
>> How does Cassandra guarantee the clustering key order when the same
>> customer_id appears in several sstables? Does it need to do a merge and
>> then sort to find out the latest view_id for the customer_id? Or there are
>> some magics happen behind the book can tell?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Jerry
>>
>>
>>
>
>
>

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

Reply via email to