Jeff Jirsa created CASSANDRA-9666:
-------------------------------------
Summary: Provide an alternative to DTCS
Key: CASSANDRA-9666
URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
Project: Cassandra
Issue Type: Improvement
Reporter: Jeff Jirsa
Fix For: 2.1.x, 2.2.x
DTCS is great for time series data, but it comes with caveats that make it
difficult to use in production (typical operator behaviors such as bootstrap,
removenode, and repair have MAJOR caveats as they relate to
max_sstable_age_days, and hints/read repair break the selection algorithm).
I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the
tiered nature of DTCS in order to address some of DTCS' operational
shortcomings. I believe it is necessary to propose an alternative rather than
simply adjusting DTCS, because it fundamentally removes the tiered nature in
order to remove the parameter max_sstable_age_days - the result is very very
different, even if it is heavily inspired by DTCS.
Specifically, rather than creating a number of windows of ever increasing
sizes, this strategy allows an operator to choose the window size, compact with
STCS within the first window of that size, and aggressive compact down to a
single sstable once that window is no longer current. The window size is a
combination of unit (minutes, hours, days) and size (1, etc), such that an
operator can expect all data using a block of that size to be compacted
together (that is, if your unit is hours, and size is 6, you will create
roughly 4 sstables per day, each one containing roughly 6 hours of data).
The result addresses a number of the problems with DateTieredCompactionStrategy:
- At the present time, DTCS’s first window is compacted using an unusual
selection criteria, which prefers files with earlier timestamps, but ignores
sizes. In TimeWindowCompactionStrategy, the first window data will be compacted
with the well tested, fast, reliable STCS. All STCS options can be passed to
TimeWindowCompactionStrategy to configure the first window’s compaction
behavior.
- HintedHandoff may put old data in new sstables, but it will have little
impact other than slightly reduced efficiency (sstables will cover a wider
range, but the old timestamps will not impact sstable selection criteria during
compaction)
- ReadRepair may put old data in new sstables, but it will have little impact
other than slightly reduced efficiency (sstables will cover a wider range, but
the old timestamps will not impact sstable selection criteria during compaction)
- Small, old sstables resulting from streams of any kind will be swiftly and
aggressively compacted with the other sstables matching their similar
maxTimestamp, without causing sstables in neighboring windows to grow in size.
- The configuration options are explicit and straightforward - the tuning
parameters leave little room for error. The window is set in common, easily
understandable terms such as “12 hours”, “1 Day”, “30 days”. The
minute/hour/day options are granular enough for users keeping data for hours,
and users keeping data for years.
- There is no explicitly configurable max sstable age, though sstables will
naturally stop compacting once new data is written in that window.
- It remains true that if old data and new data is written into the memtable at
the same time, the resulting sstables will be treated as if they were new
sstables, however, that no longer negatively impacts the compaction strategy’s
selection criteria for older windows.
Patch provided for both 2.1 (
https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 (
https://github.com/jeffjirsa/cassandra/commits/twcs )
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)