Jeff Jirsa created CASSANDRA-9597:
-------------------------------------

             Summary: DTCS should consider file SIZE in addition to time 
windowing
                 Key: CASSANDRA-9597
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9597
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jeff Jirsa
            Priority: Minor


DTCS seems to work well for the typical use case - writing data in perfect time 
order, compacting recent files, and ignoring older files.

However, there are "normal" operational actions where DTCS will fall behind and 
is unlikely to recover.

An example of this is streaming operations (for example, bootstrap or loading 
data into a cluster using sstableloader), where lots (tens of thousands) of 
very small sstables can be created spanning multiple time buckets. In these 
case, even if max_sstable_age_days is extended to allow the older incoming 
files to be compacted, the selection logic is likely to re-compact large files 
with fewer small files over and over, rather than prioritizing selection of 
max_threshold smallest files to decrease the number of candidate sstables as 
quickly as possible.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to