[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

Jonathan Shook (JIRA) Mon, 01 Dec 2014 12:55:57 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230438#comment-14230438
 ]


Jonathan Shook commented on CASSANDRA-8371:
-------------------------------------------

To be clear I'm not providing the explanation below as an argument for a 
particular parameter format. It's more of an explanation of my testing so far 
in response to [~krummas]'s question. It might be overly verbose, but it might 
be good to get some of this thinking into the discussion, for better or worse.

I'm still working on the results of testing, but there is a trade-off between 
sstable numbers and near-time compaction load. With the capability of 
IntervalTree to efficiently track many files, the need to coalesce sstables 
based on number of files is somewhat arbitrary, even if bounded at high numbers 
by other performance factors. The only immediate concern that I'm aware of for 
that is in eliminating sstable boundaries for multi-record reads. This is a 
tuning concern which can be subject to data model and access patterns. 

Another way of saying that is that once you have satisfactorily addressed 
sstable-per-read concerns, any further IO spent on coalescing sstables is 
probably wasted (a bold statement, explained with more nuance below.) 
Furthermore, this wasted IO bandwidth could be better spent in most cases 
serving read or write traffic. Given that single record reads are likely to 
fall within a given sstable for most time-series ingest patterns, and that 
paged reads will be dealing with sstable-boundaries any way, it really doesn't 
make sense to compact for compaction's sake. As long as you consider the size 
of your read patterns (number of rows, amount of data, etc), as a factor of 
your effective sstable size and hence, compaction settings, then number of 
sstables is almost irrelevant. I think this is somewhat obvious, but it is 
worth asserting as a principle of what follows.

For most scenarios, where you are ingesting at a moderate or low rate, the risk 
is more in having sstables be too small, therefore increasing your number of 
sstable per read high enough to negate some of the promise of DTCS. On the flip 
side, when you are ingesting at a high rate, as in testing, the risk is more in 
doing excessive IO as you accumulate new tables in near-time base intervals. If 
you apply the windowing logic at higher rates, you can see that the base IO 
load for compaction projects into larger and older intervals as well, and under 
enough ingest load, stacks up. This is simply to assert that there performance 
boundary concerns around too-little or too-much ingestion.

When you are completing near-time compactions at a high enough rate to keep up 
with ingest (Not a problem in my tests so far), then nearly all of the useful 
work is done in near-time compactions as opposed to larger windows. I believe 
that all the meaningful work in my testing has been done in the first or second 
base interval.

Regarding the trade-off between sstable counts and compaction IO, the max age 
is one of the primary dials you can turn to affect this. Given high ingestion, 
and the rationale above, it can make sense to have it lower than a day. I've 
been able to get some stable and useful improvements with my testing rate of 
around 1TB a day by using very 1 minute interval and a max age of "15 minutes". 

This explanation is essentially an extension of that idea that optimal DTCS 
settings must take into account (higher) ingest rates. Even if the higher 
ingestion rates are "test rig" rates, we need to accommodate them. We can't 
expect users to run a [realistic 1-2 years of ingest] test before they have 
results that are valid to size and tune systems with. Yes, we can achieve these 
results with a float. Having settings that behave in a sane way at higher 
ingest rates is a good start to having test results that users can make sense 
of and get good results in testing. On the flip side, this does not address the 
problems created if they aren't configured for their ingest rate, high or low. 
The difficulty of testing it at high speed ( A test that can complete in a 
reasonable amount of time) and one that is like production ( years ) is 
actually one of the key problems for us to solve in my mind. The discussion 
around the parameter value is all about that exact problem.

Granted, this does not take tombstones into consideration. I'm specifically 
testing a IOT/data-logging type scenario, and would presumably use default ttl 
for that. I'm not discarding that concern. It's just not a focus point of my 
testing yet. One thing at a time.

Comments, Ideas, Suggestions?




> DateTieredCompactionStrategy is always compacting 
> --------------------------------------------------
>
>                 Key: CASSANDRA-8371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: mck
>            Assignee: Björn Hegerfors
>              Labels: compaction, performance
>         Attachments: java_gc_counts_rate-month.png, 
> read-latency-recommenders-adview.png, read-latency.png, 
> sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png
>
>
> Running 2.0.11 and having switched a table to 
> [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that 
> disk IO and gc count increase, along with the number of reads happening in 
> the "compaction" hump of cfhistograms.
> Data, and generally performance, looks good, but compactions are always 
> happening, and pending compactions are building up.
> The schema for this is 
> {code}CREATE TABLE search (
>   loginid text,
>   searchid timeuuid,
>   description text,
>   searchkey text,
>   searchurl text,
>   PRIMARY KEY ((loginid), searchid)
> );{code}
> We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
> CQL executed against this keyspace, and traffic patterns, can be seen in 
> slides 7+8 of https://prezi.com/b9-aj6p2esft/
> Attached are sstables-per-read and read-latency graphs from cfhistograms, and 
> screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), 
> to DTCS (week ~46).
> These screenshots are also found in the prezi on slides 9-11.
> [~pmcfadin], [~Bj0rn], 
> Can this be a consequence of occasional deleted rows, as is described under 
> (3) in the description of CASSANDRA-6602 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

Reply via email to