[ https://issues.apache.org/jira/browse/CASSANDRA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226463#comment-14226463 ]
Björn Hegerfors commented on CASSANDRA-8360: -------------------------------------------- OK, sounds fair. That essentially means that we want to treat the "incoming window" specially. A question worth asking is what we want the incoming window for. Currently it is "keep the last unit of base_time_seconds compacted at all times". While it respects min_threshold, a value written early in the window will essentially be constantly recompacted once every (min_threshold - 1) subsequent sstable flushes. I'm fully aware that this might be a bad idea, or rather I wasn't sure if it was the right thing to do. Really, it's completely inspired by STCS's min_sstable_size which seems to do the same thing, i.e. not respect the logarithmic complexity tree-like merging on small enough SSTables. (Reminds me a bit of insertion sort being fastest on small enough arrays). So base_time_seconds has the same purpose. A problem is that it might be harder set a good default on time than on size. Setting min_sstable_size in STCS to 0 has an near-equivalent in DTCS: setting base_time_seconds to 1. The windows will be powers of base_time_seconds (up to base_time_seconds of each size), starting at 1 second. Even with this setting, data that is an hour old will be in near-hour large windows. The only meaningful difference is that SSTables 2 seconds and 10 seconds old will not be in the same window. What I mean by this, is that setting base_time_seconds to 1 is perfectly reasonable, it's just the same as setting min_sstable_size to 0 or 1 in STCS. I just want to make it clear that base_time_seconds is not really something that you should set to 1 hour (3600) just because you want SSTables older than 1 hour to be in nice 1-hour chunks. If you set it to 900 with min_threshold=4, SSTables older than 1 hour will still be in perfect 1 hour chunks (because preceding up to 4 900-second chunks, comes a 4*900=3600-second chunk). So I guess respecting min_threshold in the 'incoming window' is just as right as respecting min_threshold when compacting SSTables smaller than min_sstable_size in STCS. Which I believe it does. So there's my roundabout way of coming to the same conclusion as you, [~jbellis] :). I just have this feeling that the meaning of base_time_seconds isn't well understood. > In DTCS, always compact SSTables in the same time window, even if they are > fewer than min_threshold > --------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-8360 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8360 > Project: Cassandra > Issue Type: Improvement > Reporter: Björn Hegerfors > Priority: Minor > > DTCS uses min_threshold to decide how many time windows of the same size that > need to accumulate before merging into a larger window. The age of an SSTable > is determined as its min timestamp, and it always falls into exactly one of > the time windows. If multiple SSTables fall into the same window, DTCS > considers compacting them, but if they are fewer than min_threshold, it > decides not to do it. > When do more than 1 but fewer than min_threshold SSTables end up in the same > time window (except for the current window), you might ask? In the current > state, DTCS can spill some extra SSTables into bigger windows when the > previous window wasn't fully compacted, which happens all the time when the > latest window stops being the current one. Also, repairs and hints can put > new SSTables in old windows. > I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS > should ignore min_threshold and compact tables in the same windows regardless > of how few they are. I guess max_threshold should still be respected. > [~jjordan] suggested that this should apply to all windows but the current > window, where all the new SSTables end up. That could make sense. I'm not > clear on whether compacting many SSTables at once is more cost efficient or > not, when it comes to the very newest and smallest SSTables. Maybe compacting > as soon as 2 SSTables are seen is fine if the initial window size is small > enough? I guess the opposite could be the case too; that the very newest > SSTables should be compacted very many at a time? -- This message was sent by Atlassian JIRA (v6.3.4#6332)