[ 
https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219354#comment-14219354
 ] 

Björn Hegerfors commented on CASSANDRA-8340:
--------------------------------------------

No drawback, really. It doesn't make a big difference. Whatever is easiest to 
reason about would be best. It's true that in your repair example, it would 
have some effect, but only when the repair SSTables are not older than 
max_sstable_age_days while the big one is. I would imagine that repair would be 
likely to bring in a bunch of files that are older than max_sstable_age_days, 
which will stay scattered anyway.

I suppose using min timestamp would align more with that the rest of the 
strategy uses to determine age. In fact, something that would work even more 
consistently with the strategy would be to specify maximum window size. Perhaps 
in terms of initial window size. We have
* up to min_threshold windows of size 1, followed by
* up to min_threshold windows of size min_threshold, followed by
* up to min_threshold windows of size min_threshold^2, followed by
* up to min_threshold windows of size min_threshold^3, followed by
* etc.

And then we can simply stop generating more windows after some point. The 
simplest, yet perhaps least intuitive, option would be "max_window_exponent". 
If we set max_window_exponent=n, then we would stop after windows of size 
min_threshold^n. Example: max_window_exponent=3, min_threshold=4. The last few 
windows would be 64*base_time_seconds in size, no 256 window is every created. 
Other option alternatives are "max_window" or "max_window_seconds".

WDYT [~krummas]?

> Use sstable min timestamp when deciding if an sstable should be included in 
> DTCS compactions
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8340
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Minor
>
> Currently we check how old the newest data (max timestamp) in an sstable is 
> when we check if it should be compacted.
> If we instead switch to using min timestamp for this we have a pretty clean 
> migration path from STCS/LCS to DTCS. 
> My thinking is that before migrating, the user does a major compaction, which 
> creates a huge sstable containing all data, with min timestamp very far back 
> in time, then switching to DTCS, we will have a big sstable that we never 
> compact (ie, min timestamp of this big sstable is before 
> max_sstable_age_days), and all newer data will be after that, and that new 
> data will be properly compacted
> WDYT [~Bj0rn] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to