Dear Cassandra experts,
I have a data modeling question for cases where data needs to be sorted by
keys which can be modified.
So , say we have a user table
{
   dept_id text,
   user_id text,
   user_name text,
   mod_date timestamp
   PRIMARY KEY (dept_id,user_id)
}
Now I can query cassandra to get all users by a dept_id
What if I wanted to query to get all users in a dept, sorted by mod_date.
So, one way would be to
{
   dept_id text,
   user_id text,
   mod_date timestamp,
   user_name text,
   PRIMARY KEY (dept_id,user_id, mod_date)
}
But, mod_date changes every time user name is updated. So it can't be part
of clustering key.

Attempt 1:  Don't update the row but instead create new record for every
update. So, say the record for user foo is like below
{'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to
'bar' and then to 'baz' . In that case we add another row to table, so the
table data would look like

{'dept_id1','user_id1',TimeStamp3','baz'}
{'dept_id1','user_id1',TimeStamp2','bar'}
{'dept_id1','user_id1',TimeStamp1','foo'}

Now we can get all users in a dept, sorted by mod_date but it presents a
different problem. The data returned is duplicated.

Attempt 2 : Add another column to identify the head record much like a
linked list
{
   dept_id text,
   user_id text,
   mod_date timestamp,
   user_name text,
   next_record text
   PRIMARY KEY (user_id,user_id, mod_date)
}
Every time an update happens it adds a row and also adds the PK of new
record except in the latest record.

{'dept_id1','user_id1',TimeStamp3','baz','HEAD'}
{'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'}
{'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'}
and also add a secondary index to 'next_record' column.

Now I can support get all users in a dept, sorted by mod_date by
SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by
mod_date.

But it looks fairly involved solution and perhaps I am missing something ,
a simpler solution ..

The other option is delete and insert but for high frequency changes I
think Cassandra has issues with tombstones.

Thanks for helping on this.
Regards
Monmohan

Reply via email to