Just curious if you've looked at materialized views. Something like: CREATE MATERIALIZED VIEW users_by_mod_date AS SELECT dept_id,mod_date,user_id,user_name FROM users WHERE mod_date IS NOT NULL PRIMARY KEY (dept_id,mod_date,user_id) WITH CLUSTERING ORDER BY (mod_date desc)
On Thu, Apr 6, 2017 at 5:54 AM Monmohan Singh <monmo...@gmail.com> wrote: > Dear Cassandra experts, > I have a data modeling question for cases where data needs to be sorted by > keys which can be modified. > So , say we have a user table > { > dept_id text, > user_id text, > user_name text, > mod_date timestamp > PRIMARY KEY (dept_id,user_id) > } > Now I can query cassandra to get all users by a dept_id > What if I wanted to query to get all users in a dept, sorted by mod_date. > So, one way would be to > { > dept_id text, > user_id text, > mod_date timestamp, > user_name text, > PRIMARY KEY (dept_id,user_id, mod_date) > } > But, mod_date changes every time user name is updated. So it can't be part > of clustering key. > > Attempt 1: Don't update the row but instead create new record for every > update. So, say the record for user foo is like below > {'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to > 'bar' and then to 'baz' . In that case we add another row to table, so the > table data would look like > > {'dept_id1','user_id1',TimeStamp3','baz'} > {'dept_id1','user_id1',TimeStamp2','bar'} > {'dept_id1','user_id1',TimeStamp1','foo'} > > Now we can get all users in a dept, sorted by mod_date but it presents a > different problem. The data returned is duplicated. > > Attempt 2 : Add another column to identify the head record much like a > linked list > { > dept_id text, > user_id text, > mod_date timestamp, > user_name text, > next_record text > PRIMARY KEY (user_id,user_id, mod_date) > } > Every time an update happens it adds a row and also adds the PK of new > record except in the latest record. > > {'dept_id1','user_id1',TimeStamp3','baz','HEAD'} > {'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'} > {'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'} > and also add a secondary index to 'next_record' column. > > Now I can support get all users in a dept, sorted by mod_date by > SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by > mod_date. > > But it looks fairly involved solution and perhaps I am missing something , > a simpler solution .. > > The other option is delete and insert but for high frequency changes I > think Cassandra has issues with tombstones. > > Thanks for helping on this. > Regards > Monmohan > >