Re: The changing clustering key

Monmohan Singh Thu, 06 Apr 2017 19:03:35 -0700

thanks for the pointer. Let me readup more on Materialized views and see if
that helps solve our problem. I do know that its not supported in our
current version 2.2.x but I can explore moving to Cassandra 3.


On Fri, 7 Apr 2017 at 04:22 Eric Stevens <migh...@gmail.com> wrote:

> Just curious if you've looked at materialized views.  Something like:
>
> CREATE MATERIALIZED VIEW users_by_mod_date AS
>        SELECT dept_id,mod_date,user_id,user_name FROM users
>        WHERE mod_date IS NOT NULL
>        PRIMARY KEY (dept_id,mod_date,user_id)
>        WITH CLUSTERING ORDER BY (mod_date desc)
>
> On Thu, Apr 6, 2017 at 5:54 AM Monmohan Singh <monmo...@gmail.com> wrote:
>
> Dear Cassandra experts,
> I have a data modeling question for cases where data needs to be sorted by
> keys which can be modified.
> So , say we have a user table
> {
>    dept_id text,
>    user_id text,
>    user_name text,
>    mod_date timestamp
>    PRIMARY KEY (dept_id,user_id)
> }
> Now I can query cassandra to get all users by a dept_id
> What if I wanted to query to get all users in a dept, sorted by mod_date.
> So, one way would be to
> {
>    dept_id text,
>    user_id text,
>    mod_date timestamp,
>    user_name text,
>    PRIMARY KEY (dept_id,user_id, mod_date)
> }
> But, mod_date changes every time user name is updated. So it can't be part
> of clustering key.
>
> Attempt 1:  Don't update the row but instead create new record for every
> update. So, say the record for user foo is like below
> {'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to
> 'bar' and then to 'baz' . In that case we add another row to table, so the
> table data would look like
>
> {'dept_id1','user_id1',TimeStamp3','baz'}
> {'dept_id1','user_id1',TimeStamp2','bar'}
> {'dept_id1','user_id1',TimeStamp1','foo'}
>
> Now we can get all users in a dept, sorted by mod_date but it presents a
> different problem. The data returned is duplicated.
>
> Attempt 2 : Add another column to identify the head record much like a
> linked list
> {
>    dept_id text,
>    user_id text,
>    mod_date timestamp,
>    user_name text,
>    next_record text
>    PRIMARY KEY (user_id,user_id, mod_date)
> }
> Every time an update happens it adds a row and also adds the PK of new
> record except in the latest record.
>
> {'dept_id1','user_id1',TimeStamp3','baz','HEAD'}
> {'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'}
> {'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'}
> and also add a secondary index to 'next_record' column.
>
> Now I can support get all users in a dept, sorted by mod_date by
> SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by
> mod_date.
>
> But it looks fairly involved solution and perhaps I am missing something ,
> a simpler solution ..
>
> The other option is delete and insert but for high frequency changes I
> think Cassandra has issues with tombstones.
>
> Thanks for helping on this.
> Regards
> Monmohan
>
>

Re: The changing clustering key

Reply via email to