It's possible you'll run into compaction headaches. Likely actually.

If you have time-bucketed purge/archives, I'd implement a time bucketing
strategy using rotating tables dedicated to a time period so that when an
entire table is ready for archiving you just snapshot its sstables and then
TRUNCATE/nuke the time bucket table.

Queries that span buckets and calculating the table to target on inserts
are a major pain in the ass, but at scale you'll probably want to consider
dingo something like this.

On Wed, Mar 7, 2018 at 8:19 PM, kurt greaves <k...@instaclustr.com> wrote:

> The important point to consider is whether you are deleting old data or
> recently written data. How old/recent depends on your write rate to the
> cluster and there's no real formula. Basically you want to avoid deleting a
> lot of old data all at once because the tombstones will end up in new
> SSTables and the data to be deleted will live in higher levels (LCS) or
> large SSTables (STCS), which won't get compacted together for a long time.
> In this case it makes no difference if you do a big purge or if you break
> it up, because at the end of the day if your big purge is just old data,
> all the tombstones will have to stick around for awhile until they make it
> to the higher levels/bigger SSTables.
>
> If you have to purge large amounts of old data, the easiest way is to 1.
> Make sure you have at least 50% disk free (for large/major compactions)
> and/or 2. Use garbagecollect compactions (3.10+)
> ​
>

Reply via email to