[ 
https://issues.apache.org/jira/browse/CASSANDRA-11060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114977#comment-15114977
 ] 

Marcus Eriksson commented on CASSANDRA-11060:
---------------------------------------------

also see CASSANDRA-11056

> Allow DTCS old SSTable filtering to use min timestamp instead of max
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-11060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11060
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sam Bisbee
>              Labels: dtcs
>
> We have observed a DTCS behavior when using TTLs where SSTables are never or 
> very rarely fully expired due to compaction, allowing expired data to be 
> "stuck" in large partially expired SSTables.
> This is because compaction filtering is performed on the max timestamp, which 
> continues to grow as SSTables are compacted together. This means they will 
> never move past max_sstable_age_days. With a sufficiently large TTL, like 30 
> days, this allows old but not expired SSTables to continue combining and 
> never become fully expired, even with a max_sstable_age_days of 1.
> As a result we have seen expired data hang around in large SSTables for over 
> six months longer than it should have. This is obviously wasteful and a disk 
> capacity issue.
> As a result we have been running an extended version of DTCS called MTCS in 
> some deployments. The only change is that it uses min timestamp instead of 
> max for compaction filtering (filterOldSSTables()). This allows SSTables to 
> move beyond max_sstable_age_days and stop compacting, which means the entire 
> SSTable can become fully expired and be dropped off disk as intended.
> You can see and test MTCS here: https://github.com/threatstack/mtcs
> I am not advocating that MTCS be its own stand alone compaction strategy. 
> However, I would like to see a configuration option for DTCS that allows you 
> to specify whether old SSTables should be filtered on min or max timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to