It seems that we are mostly talking about write and read keys into/from Cassandra cluster. I’m wondering how did you successfully deal with deleting/expiring keys in Cassandra? An typical example is you want to delete keys that haven’t been modified in certain time period (i.e., old keys). Here’s my thoughts:
1) If you use order preserve partition, you need to iterate through all keys, periodically, to check their last modified time to decide whether a key should be deleted. When you have hundreds million of keys with high write/read traffic, it will be very time and resource consuming to iterate all keys in all clusters. 2) If you use random partition, you’ll need to keep a list of ALL keys somewhere and keep it updated through the time, then go through it periodically to delete expired items. Again when you have hundreds million of keys, maintaining such a big dynamic key list with their expiration time is not trivial work. 3) Once keys are deleted, do you have to wait till next GC to clean them from disk or memory (suppose you don’t run cleanup manually)? What’s the strategy for Cassandra to handle deleted items (notify other replica nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom filter etc). I’m asking this because if the keys refresh very fast (i.e., high volume write/read and expiration is kind of short) how will the data file grow and how does this impact the system performance. So what’s your opinion to deal with the above cases to expire keys? I’m trying to decide whether we can use Cassandra for just high traffic read-only, write-only or both read and write. Thanks, -Weijun