[ 
https://issues.apache.org/jira/browse/CASSANDRA-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Björn Hegerfors updated CASSANDRA-8359:
---------------------------------------
    Attachment: cassandra-2.0-CASSANDRA-8359.txt

> Make DTCS consider removing SSTables much more frequently
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8359
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8359
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Björn Hegerfors
>            Priority: Minor
>         Attachments: cassandra-2.0-CASSANDRA-8359.txt
>
>
> When I run DTCS on a table where every value has a TTL (always the same TTL), 
> SSTables are completely expired, but still stay on disk for much longer than 
> they need to. I've applied CASSANDRA-8243, but it doesn't make an apparent 
> difference (probably because the subject SSTables are purged via compaction 
> anyway, if not by directly dropping them).
> Disk size graphs show clearly that tombstones are only removed when the 
> oldest SSTable participates in compaction. In the long run, size on disk 
> continually grows bigger. This should not have to happen. It should easily be 
> able to stay constant, thanks to DTCS separating the expired data from the 
> rest.
> I think checks for whether SSTables can be dropped should happen much more 
> frequently. This is something that probably only needs to be tweaked for 
> DTCS, but perhaps there's a more general place to put this. Anyway, my 
> thinking is that DTCS should, on every call to getNextBackgroundTask, check 
> which SSTables can be dropped. It would be something like a call to 
> CompactionController.getFullyExpiredSSTables with all non-compactingSSTables 
> sent in as "compacting" and all other SSTables sent in as "overlapping". The 
> returned SSTables, if any, are then added to whichever set of SSTables that 
> DTCS decides to compact. Then before the compaction happens, Cassandra is 
> going to make another call to CompactionController.getFullyExpiredSSTables, 
> where it will see that it can just drop them.
> This approach has a bit of redundancy in that it needs to call 
> CompactionController.getFullyExpiredSSTables twice. To avoid that, the code 
> path for deciding SSTables to drop would have to be changed.
> (Side tracking a little here: I'm also thinking that tombstone compactions 
> could be considered more often in DTCS. Maybe even some kind of multi-SSTable 
> tombstone compaction involving the oldest couple of SSTables...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to