It seems that we are mostly talking about write and read keys into/from 
Cassandra cluster. I’m wondering how did you successfully deal with 
deleting/expiring keys in Cassandra? An typical example is you want to delete 
keys that haven’t been modified in certain time period (i.e., old keys). Here’s 
my thoughts:

 

1)      If you use order preserve partition, you need to iterate through all 
keys, periodically, to check their last modified time to decide whether a key 
should be deleted. When you have hundreds million of keys with high write/read 
traffic, it will be very time and resource consuming to iterate all keys in all 
clusters.

2)      If you use random partition, you’ll need to keep a list of ALL keys 
somewhere and keep it updated through the time, then go through it periodically 
to delete expired items. Again when you have hundreds million of keys, 
maintaining such a big dynamic key list with their expiration time is not 
trivial work.

3)      Once keys are deleted, do you have to wait till next GC to clean them 
from disk or memory (suppose you don’t run cleanup manually)? What’s the 
strategy for Cassandra to handle deleted items (notify other replica nodes, 
cleanup memory/disk, defrag/rebuild disk files, rebuild bloom filter etc). I’m 
asking this because if the keys refresh very fast (i.e., high volume write/read 
and expiration is kind of short) how will the data file grow and how does this 
impact the system performance. 

 

So what’s your opinion to deal with the above cases to expire keys? I’m trying 
to decide whether we can use Cassandra for just high traffic read-only, 
write-only or both read and write.

 

Thanks,

 

-Weijun

Reply via email to