It's possible you'll run into compaction headaches. Likely actually. If you have time-bucketed purge/archives, I'd implement a time bucketing strategy using rotating tables dedicated to a time period so that when an entire table is ready for archiving you just snapshot its sstables and then TRUNCATE/nuke the time bucket table.
Queries that span buckets and calculating the table to target on inserts are a major pain in the ass, but at scale you'll probably want to consider dingo something like this. On Wed, Mar 7, 2018 at 8:19 PM, kurt greaves <k...@instaclustr.com> wrote: > The important point to consider is whether you are deleting old data or > recently written data. How old/recent depends on your write rate to the > cluster and there's no real formula. Basically you want to avoid deleting a > lot of old data all at once because the tombstones will end up in new > SSTables and the data to be deleted will live in higher levels (LCS) or > large SSTables (STCS), which won't get compacted together for a long time. > In this case it makes no difference if you do a big purge or if you break > it up, because at the end of the day if your big purge is just old data, > all the tombstones will have to stick around for awhile until they make it > to the higher levels/bigger SSTables. > > If you have to purge large amounts of old data, the easiest way is to 1. > Make sure you have at least 50% disk free (for large/major compactions) > and/or 2. Use garbagecollect compactions (3.10+) > >