We have a scheduler app here at smartthings, where we track per-second
tasks to be executed.

These are all TTL'd to be destroyed after the second the event was
registered with has passed.

If the scheduling window was sufficiently small, say, 1 day, we could
probably use a time window compaction strategy with this. But the window is
one-two years worth of adhoc event registration per the contract.

Thus, the intermingling of all this data TTL'ing at the different times
since they are registered at different times means the sstables are not
written with data TTLing in the same rough time period. If they were, then
compaction would be a relatively easy process since the entire sstable
would tombstone.

We could kind of do this by doing sharded tables for the time periods and
rotating the shards for duty, and truncating them as they are recycled.

But an elegant way would be a custom compaction strategy that would
"window" the data into clustered sstables that could be compacted with
other similarly time bucketed sstables.

This would require visibility into the rowkey when it came time to convert
the memtable data to sstables. Is that even possible with compaction
schemes? We would provide a requirement that the time-based data would be
in the row key if it is a composite row key, making it required.

Reply via email to