[ 
https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218593#comment-14218593
 ] 

Björn Hegerfors commented on CASSANDRA-8340:
--------------------------------------------

OK, let's see. This is a big SSTable with a timestamp span of [t0, t1]. Since 
it came out of a major compaction, t1 is close to the current time. DTCS would 
never generate an SSTable that large with t1 that close to current time. But as 
time passes, [t0, t1] eventually becomes a timestamp span that even DTCS could 
have generated. Only beyond that point in time would DTCS actually consider 
compacting it, because it's t0 that governs when it compacts next, not t1. This 
is because t0 is so old and so far away from the min timestamp of any other 
SSTable. I'm certain of this. I haven't got a formula for this (I wish to make 
one), but I think that the major compacted SSTable may even have to double its 
age before next compaction will happen, so if the min timestamp was older than 
max_sstable_age_days when switching strategies, the max timestamp will be older 
than that before any compaction was ever considered.

In other words, your scenario is not in any way a particular reason to change 
the max_sstable_age_days behavior. There may still be other reasons.

Did you get that? I had a hard time figuring out a sensible way to formulate my 
reasoning here. Rewrote this 3 times :P

> Use sstable min timestamp when deciding if an sstable should be included in 
> DTCS compactions
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8340
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Minor
>
> Currently we check how old the newest data (max timestamp) in an sstable is 
> when we check if it should be compacted.
> If we instead switch to using min timestamp for this we have a pretty clean 
> migration path from STCS/LCS to DTCS. 
> My thinking is that before migrating, the user does a major compaction, which 
> creates a huge sstable containing all data, with min timestamp very far back 
> in time, then switching to DTCS, we will have a big sstable that we never 
> compact (ie, min timestamp of this big sstable is before 
> max_sstable_age_days), and all newer data will be after that, and that new 
> data will be properly compacted
> WDYT [~Bj0rn] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to