Jeff Jirsa created CASSANDRA-9597:
-------------------------------------
Summary: DTCS should consider file SIZE in addition to time
windowing
Key: CASSANDRA-9597
URL: https://issues.apache.org/jira/browse/CASSANDRA-9597
Project: Cassandra
Issue Type: Improvement
Reporter: Jeff Jirsa
Priority: Minor
DTCS seems to work well for the typical use case - writing data in perfect time
order, compacting recent files, and ignoring older files.
However, there are "normal" operational actions where DTCS will fall behind and
is unlikely to recover.
An example of this is streaming operations (for example, bootstrap or loading
data into a cluster using sstableloader), where lots (tens of thousands) of
very small sstables can be created spanning multiple time buckets. In these
case, even if max_sstable_age_days is extended to allow the older incoming
files to be compacted, the selection logic is likely to re-compact large files
with fewer small files over and over, rather than prioritizing selection of
max_threshold smallest files to decrease the number of candidate sstables as
quickly as possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)