Hi,

your primary goal is to fetch a user by dept_id and user_id and additionally 
keep versions of the user data?

{
   dept_id text,
   user_id text,
   mod_date timestamp,
   user_name text,
   PRIMARY KEY ((dept_id,user_id), mod_date)
   WITH CLUSTERING ORDER BY (mod_date DESC);
}

There is a difference between partition key and cluster keys. My suggestion 
will end up with all versions of a particular (dept_id,user_id) on a partition 
(say node) and all versions of your data on that portion in descending order by 
mod_date. 

For a normal loopkup you do not need to know mod_date, a simple SELECT * FROM 
users WHERE dept_id=foo and user_id=bar LIMIT 1 will do.

http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/



Gesendet von meinem Windows 10 Phone

Von: Monmohan Singh
Gesendet: Donnerstag, 6. April 2017 13:54
An: user@cassandra.apache.org
Betreff: The changing clustering key

Dear Cassandra experts,
I have a data modeling question for cases where data needs to be sorted by keys 
which can be modified.
So , say we have a user table
{
   dept_id text,
   user_id text,
   user_name text,
   mod_date timestamp
   PRIMARY KEY (dept_id,user_id)
}
Now I can query cassandra to get all users by a dept_id
What if I wanted to query to get all users in a dept, sorted by mod_date.
So, one way would be to
{
   dept_id text,
   user_id text,
   mod_date timestamp,
   user_name text,
   PRIMARY KEY (dept_id,user_id, mod_date)
}
But, mod_date changes every time user name is updated. So it can't be part of 
clustering key.

Attempt 1:  Don't update the row but instead create new record for every 
update. So, say the record for user foo is like below
{'dept_id1','user_id1',TimeStamp1','foo'} and then the name was changed to 
'bar' and then to 'baz' . In that case we add another row to table, so the 
table data would look like

{'dept_id1','user_id1',TimeStamp3','baz'}
{'dept_id1','user_id1',TimeStamp2','bar'}
{'dept_id1','user_id1',TimeStamp1','foo'}

Now we can get all users in a dept, sorted by mod_date but it presents a 
different problem. The data returned is duplicated. 

Attempt 2 : Add another column to identify the head record much like a linked 
list
{
   dept_id text,
   user_id text,
   mod_date timestamp,
   user_name text,
   next_record text
   PRIMARY KEY (user_id,user_id, mod_date)
}
Every time an update happens it adds a row and also adds the PK of new record 
except in the latest record.

{'dept_id1','user_id1',TimeStamp3','baz','HEAD'}
{'dept_id1','user_id1',TimeStamp2','bar','dept_id1#user_id1#TimeStamp3'}
{'dept_id1','user_id1',TimeStamp1','foo','dept_id1#user_id1#TimeStamp2'}
and also add a secondary index to 'next_record' column.

Now I can support get all users in a dept, sorted by mod_date by
SELECT * from USERS where dept_id=':dept' AND next_record='HEAD' order by 
mod_date.

But it looks fairly involved solution and perhaps I am missing something , a 
simpler solution ..

The other option is delete and insert but for high frequency changes I think 
Cassandra has issues with tombstones.

Thanks for helping on this.
Regards
Monmohan


Reply via email to