[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables

Marcus Eriksson (JIRA) Thu, 13 Nov 2014 04:28:06 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209684#comment-14209684
 ]


Marcus Eriksson commented on CASSANDRA-8243:
--------------------------------------------

Problem with this is that we might drop tombstones that actually cover data in 
other sstables, even though that data is also expired. 

I don't see any reason that this would make a difference to users, but I'm 
gonna throw up the [~slebresne]-flag here as he said back in CASSANDRA-5228 
that we must account for the timestamp of candidates that cover data in other 
sstables (in the code in the comment from Mar 21st)

> DTCS can leave time-overlaps, limiting ability to expire entire SSTables
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8243
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Björn Hegerfors
>            Assignee: Björn Hegerfors
>            Priority: Minor
>              Labels: compaction, performance
>             Fix For: 2.0.12, 2.1.3
>
>         Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt
>
>
> CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match 
> for tables where every value is written with a TTL. DTCS makes sure to keep 
> old data separate from new data. So shortly after the TTL has passed, 
> Cassandra should be able to throw away the whole SSTable containing a given 
> data point.
> CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't 
> overlap (in terms of timestamps) with another SSTable which cannot be deleted.
> DTCS however, can't guarantee that SSTables won't overlap (again, in terms of 
> timestamps). In a test that I ran, every single SSTable overlapped with its 
> nearest neighbors by a very tiny amount. My reasoning for why this could 
> happen is that the dumped memtables were already overlapping from the start. 
> DTCS will never create an overlap where there is none. I surmised that this 
> happened in my case because I sent parallel writes which must have come out 
> of order. This was just locally, and out of order writes should be much more 
> common non-locally.
> That means that the SSTable removal optimization may never get a chance to 
> kick in!
> I can see two solutions:
> 1. Make DTCS split SSTables on time window borders. This will essentially 
> only be done on a newly dumped memtable once every base_time_seconds.
> 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an 
> SSTable can be dropped completely, of course without affecting any semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables

Reply via email to