[ 
https://issues.apache.org/jira/browse/CASSANDRA-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko resolved CASSANDRA-6909.
------------------------------------------
    Resolution: Duplicate

CASSANDRA-5546 is going to mostly address the problem from a different angle - 
closing the ticket as a dup of that.

> A way to expire columns without converting to tombstones
> --------------------------------------------------------
>
>                 Key: CASSANDRA-6909
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6909
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Bartłomiej Romański
>
> Imagine the following scenario. 
> - You need to store some data knowing that you will need them only for 
> limited time (say 7 days).
> - After that you just don't care. You don't need them to be returned in the 
> queries, but if they are returned that's not a problem at all - you won't 
> look at them anyway.
> - You records are small. Row keys and column names are even longer than the 
> actual values (e.g. ints vs strings).
> - You reuse rows. You add some new columns to most of the rows every day or 
> two. This means that columns expire often, rows usually not.
> - You generate a lot of data and want to make sure that expired records do 
> not consume disk space for too long.
> Current TTL feature do not handle that situation well. When compaction 
> finally decides that it's worth to compact the given sstable it won't simply 
> get rid of expired columns. Instead it will transform them into tombstones. 
> In case of small values that's not a saving at all.
> Even if you set grace period to 0 tombstones cannot be removed too early 
> because some other sstable can still have values that should be "covered" by 
> this tombstone. 
> You can get rid of tombstone only in two cases:
> - it's a major compaction (never happens with LCS, requires a lot of space in 
> STCS)
> - bloom filters tell you that there are no others sstable with this row key
> The second option is not common if you usually have multiple columns in a 
> single row that was written not at once. It's a great chance you'll have your 
> row spread across multiple sstables. And from time to time a new ones are 
> generated. There's very little chance they'll all meet in one compaction at 
> some point. 
> What's funny, bloom filters returns true if there's a tombstone for the given 
> row in the given sstable. So you won't remove tombstones during compaction, 
> because there's some other tombstone in another sstable for that row :/
> After a while, you end up with a lot of tombstones (majority of your data) 
> and can do nothing about that.
> Now image that Cassandra knows that we just don't care about data older than 
> 7 days. 
> Firstly, it can simply drop such columns during compactions (without 
> converting them to tombstones or anything like that).
> Secondly, if it detects an sstable older than 7 days it can safely remove it 
> at all (it cannot contain any active data).
> These two *guarantee* that you data will be removed after 14 days (2xTTL). If 
> we do compaction after 7 days, expired data will be removed. If we not, whole 
> sstable will be removed after another 7 days.
> That's what I expected from CASSANDRA-3974, but it turned out to be a just 
> trivial, frontend feature. 
> I suggest to rethink this mechanism. I don't believe that it's a common 
> scenario that someone who sets TTL for whole CF need all this strong 
> guarantees that data will not reappear in the future in case of some issues 
> with consistency (that's why we need this whole mess with tombstones). 
> I believe common case with per-CF TTL is that you just want an efficient way 
> of recover you disk space (and improve reads performance by having less 
> sstables and less data in general).
> To work around this we currently periodically stop Cassandra, simply remove 
> too old sstables, and start it back. Works OK, but does not solve problem 
> fully (if tombstone is rewritten by compactions often, we will never remove 
> it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to