[
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230438#comment-14230438
]
Jonathan Shook commented on CASSANDRA-8371:
-------------------------------------------
To be clear I'm not providing the explanation below as an argument for a
particular parameter format. It's more of an explanation of my testing so far
in response to [~krummas]'s question. It might be overly verbose, but it might
be good to get some of this thinking into the discussion, for better or worse.
I'm still working on the results of testing, but there is a trade-off between
sstable numbers and near-time compaction load. With the capability of
IntervalTree to efficiently track many files, the need to coalesce sstables
based on number of files is somewhat arbitrary, even if bounded at high numbers
by other performance factors. The only immediate concern that I'm aware of for
that is in eliminating sstable boundaries for multi-record reads. This is a
tuning concern which can be subject to data model and access patterns.
Another way of saying that is that once you have satisfactorily addressed
sstable-per-read concerns, any further IO spent on coalescing sstables is
probably wasted (a bold statement, explained with more nuance below.)
Furthermore, this wasted IO bandwidth could be better spent in most cases
serving read or write traffic. Given that single record reads are likely to
fall within a given sstable for most time-series ingest patterns, and that
paged reads will be dealing with sstable-boundaries any way, it really doesn't
make sense to compact for compaction's sake. As long as you consider the size
of your read patterns (number of rows, amount of data, etc), as a factor of
your effective sstable size and hence, compaction settings, then number of
sstables is almost irrelevant. I think this is somewhat obvious, but it is
worth asserting as a principle of what follows.
For most scenarios, where you are ingesting at a moderate or low rate, the risk
is more in having sstables be too small, therefore increasing your number of
sstable per read high enough to negate some of the promise of DTCS. On the flip
side, when you are ingesting at a high rate, as in testing, the risk is more in
doing excessive IO as you accumulate new tables in near-time base intervals. If
you apply the windowing logic at higher rates, you can see that the base IO
load for compaction projects into larger and older intervals as well, and under
enough ingest load, stacks up. This is simply to assert that there performance
boundary concerns around too-little or too-much ingestion.
When you are completing near-time compactions at a high enough rate to keep up
with ingest (Not a problem in my tests so far), then nearly all of the useful
work is done in near-time compactions as opposed to larger windows. I believe
that all the meaningful work in my testing has been done in the first or second
base interval.
Regarding the trade-off between sstable counts and compaction IO, the max age
is one of the primary dials you can turn to affect this. Given high ingestion,
and the rationale above, it can make sense to have it lower than a day. I've
been able to get some stable and useful improvements with my testing rate of
around 1TB a day by using very 1 minute interval and a max age of "15 minutes".
This explanation is essentially an extension of that idea that optimal DTCS
settings must take into account (higher) ingest rates. Even if the higher
ingestion rates are "test rig" rates, we need to accommodate them. We can't
expect users to run a [realistic 1-2 years of ingest] test before they have
results that are valid to size and tune systems with. Yes, we can achieve these
results with a float. Having settings that behave in a sane way at higher
ingest rates is a good start to having test results that users can make sense
of and get good results in testing. On the flip side, this does not address the
problems created if they aren't configured for their ingest rate, high or low.
The difficulty of testing it at high speed ( A test that can complete in a
reasonable amount of time) and one that is like production ( years ) is
actually one of the key problems for us to solve in my mind. The discussion
around the parameter value is all about that exact problem.
Granted, this does not take tombstones into consideration. I'm specifically
testing a IOT/data-logging type scenario, and would presumably use default ttl
for that. I'm not discarding that concern. It's just not a focus point of my
testing yet. One thing at a time.
Comments, Ideas, Suggestions?
> DateTieredCompactionStrategy is always compacting
> --------------------------------------------------
>
> Key: CASSANDRA-8371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: mck
> Assignee: Björn Hegerfors
> Labels: compaction, performance
> Attachments: java_gc_counts_rate-month.png,
> read-latency-recommenders-adview.png, read-latency.png,
> sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png
>
>
> Running 2.0.11 and having switched a table to
> [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that
> disk IO and gc count increase, along with the number of reads happening in
> the "compaction" hump of cfhistograms.
> Data, and generally performance, looks good, but compactions are always
> happening, and pending compactions are building up.
> The schema for this is
> {code}CREATE TABLE search (
> loginid text,
> searchid timeuuid,
> description text,
> searchkey text,
> searchurl text,
> PRIMARY KEY ((loginid), searchid)
> );{code}
> We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
> CQL executed against this keyspace, and traffic patterns, can be seen in
> slides 7+8 of https://prezi.com/b9-aj6p2esft/
> Attached are sstables-per-read and read-latency graphs from cfhistograms, and
> screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44),
> to DTCS (week ~46).
> These screenshots are also found in the prezi on slides 9-11.
> [~pmcfadin], [~Bj0rn],
> Can this be a consequence of occasional deleted rows, as is described under
> (3) in the description of CASSANDRA-6602 ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)