Tombstones have to be created. The SSTables are immutable, so the data cannot 
be deleted. Therefore, a tombstone is required. The value you deleted will be 
physically removed during compaction.

My workload sounds similar to yours in some respects, and I was able to get C* 
working for me. I have large chunks of data which I periodically replace. I 
write the new data, update a reference, and then delete the old data. I 
designed my schema to be tombstone-friendly, and C* works great. For some of my 
tables I am able to delete entire partitions. Because of the reference that I 
updated, I never try to access the old data, and therefore the tombstones for 
these partitions are never read. The old data simply has to wait for 
compaction. Other tables require deleting records within partitions. These 
tombstones do get read, so there are performance implications. I was able to 
design my schema so that no partition ever has more than a few tombstones (one 
for each generation of deleted data, which is usually no more than one).

Hope this helps.

Robert

On Dec 16, 2014, at 8:22 AM, Ian Rose 
<ianr...@fullstory.com<mailto:ianr...@fullstory.com>> wrote:

Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to "clean up" data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?

  *   gc_grace_seconds = 0, which ensures that tombstones are never created
  *   replication factor = 3
  *   for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down
  *   for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones)
  *   for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian


Reply via email to