[
https://issues.apache.org/jira/browse/CASSANDRA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksey Yeschenko resolved CASSANDRA-6909.
------------------------------------------
Resolution: Duplicate
CASSANDRA-5546 is going to mostly address the problem from a different angle -
closing the ticket as a dup of that.
> A way to expire columns without converting to tombstones
> --------------------------------------------------------
>
> Key: CASSANDRA-6909
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6909
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Bartłomiej Romański
>
> Imagine the following scenario.
> - You need to store some data knowing that you will need them only for
> limited time (say 7 days).
> - After that you just don't care. You don't need them to be returned in the
> queries, but if they are returned that's not a problem at all - you won't
> look at them anyway.
> - You records are small. Row keys and column names are even longer than the
> actual values (e.g. ints vs strings).
> - You reuse rows. You add some new columns to most of the rows every day or
> two. This means that columns expire often, rows usually not.
> - You generate a lot of data and want to make sure that expired records do
> not consume disk space for too long.
> Current TTL feature do not handle that situation well. When compaction
> finally decides that it's worth to compact the given sstable it won't simply
> get rid of expired columns. Instead it will transform them into tombstones.
> In case of small values that's not a saving at all.
> Even if you set grace period to 0 tombstones cannot be removed too early
> because some other sstable can still have values that should be "covered" by
> this tombstone.
> You can get rid of tombstone only in two cases:
> - it's a major compaction (never happens with LCS, requires a lot of space in
> STCS)
> - bloom filters tell you that there are no others sstable with this row key
> The second option is not common if you usually have multiple columns in a
> single row that was written not at once. It's a great chance you'll have your
> row spread across multiple sstables. And from time to time a new ones are
> generated. There's very little chance they'll all meet in one compaction at
> some point.
> What's funny, bloom filters returns true if there's a tombstone for the given
> row in the given sstable. So you won't remove tombstones during compaction,
> because there's some other tombstone in another sstable for that row :/
> After a while, you end up with a lot of tombstones (majority of your data)
> and can do nothing about that.
> Now image that Cassandra knows that we just don't care about data older than
> 7 days.
> Firstly, it can simply drop such columns during compactions (without
> converting them to tombstones or anything like that).
> Secondly, if it detects an sstable older than 7 days it can safely remove it
> at all (it cannot contain any active data).
> These two *guarantee* that you data will be removed after 14 days (2xTTL). If
> we do compaction after 7 days, expired data will be removed. If we not, whole
> sstable will be removed after another 7 days.
> That's what I expected from CASSANDRA-3974, but it turned out to be a just
> trivial, frontend feature.
> I suggest to rethink this mechanism. I don't believe that it's a common
> scenario that someone who sets TTL for whole CF need all this strong
> guarantees that data will not reappear in the future in case of some issues
> with consistency (that's why we need this whole mess with tombstones).
> I believe common case with per-CF TTL is that you just want an efficient way
> of recover you disk space (and improve reads performance by having less
> sstables and less data in general).
> To work around this we currently periodically stop Cassandra, simply remove
> too old sstables, and start it back. Works OK, but does not solve problem
> fully (if tombstone is rewritten by compactions often, we will never remove
> it).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)