In process of doing my second major data purge from a cassandra system.
Almost all of my purging is done via row tombstones. While performing this
the second time while trying to cajole compaction to occur (in 2.1.x,
LevelledCompaction) to goddamn actually compact the data, I've been
thinking as to why there isn't a separate set of sstable infrastructure
setup for row deletion tombstones.
I'm imagining that row tombstones are written to separate sstables than
mainline data updates/appends and range/column tombstones.
By writing them to separate sstables, the compaction systems can
preferentially merge / process them when compacting sstables.
This would create an additional sstable for lookup in the bloom filters,
granted. I had visions of short circuiting the lookups to other sstables if
a row tombstone was present in one of the special row tombstone sstables.
But that would only be possible if there was the notion of a "super row
tombstone" that permanently deleted a rowkey and all future writes would be
invalidated. Kind of like how a tombstone with a mistakenly huge timestamp
becomes a sneaky permanent tombstone, but intended. There could be a
special operation / statement to undo this permanent tombstone, and since
the row tombstones would be in their own dedicated sstables, they could
process and compact more quickly, with prioritization by the compactor.
I'm thinking there must be something I am forgetting in the
read/write/compaction paths that invalidate this.