Antti Nissinen created CASSANDRA-9572:
-----------------------------------------
Summary: DateTieredCompactionStrategy fails to combine SSTables
correctly when TTL is used.
Key: CASSANDRA-9572
URL: https://issues.apache.org/jira/browse/CASSANDRA-9572
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Antti Nissinen
Fix For: 2.1.5
Attachments: cassandra_sstable_metadata_reader.py,
cassandra_sstable_timespan_graph.py, compaction_stage_test01_jira.log,
compaction_stage_test02_jira.log, datagen.py, explanation_jira.txt,
motivation_jira.txt
DateTieredCompaction works correctly when data is dumped for a certain time
period in short SSTables in time manner and then compacted together. However,
if TTL is applied to the data columns the DTCS fails to compact files correctly
in timely manner. In our opinion the problem is caused by two issues:
A) During the DateTieredCompaction process the getFullyExpiredSStables is
called twice. First from the DateTieredCompactionStrategy class and second time
from the CompactionTask class. On the first time the target is to find out
fully expired SStables that are not overlapping with any non-fully expired
SSTables. That works correctly. When the getFullyExpiredSSTables is called
second time from CompactionTask class the selection of fully expired SSTables
is modified compared to the first selection.
B) The minimum timestamp of the new SSTables created by combining together
fully expired SSTable and files from the most interesting bucket is not correct.
These two issues together cause problems for the DTCS process when it combines
together SSTables having overlap in time and TTL for the column. This is
demonstrated by generating test data first without compactions and showing the
timely distribution of files. When the compaction is enabled the DCTS combines
files together, but the end result is not something to be expected. This is
demonstrated in the file motivation_jira.txt
Attachments contain following material:
- Motivation_jira.txt: Practical examples how the DTCS behaves with TTL
- Explanation_jira.txt: gives more details, explains test cases and
demonstrates the problems in the compaction process
- Logfile file for the compactions in the first test case
(compaction_stage_test01_jira.log)
- Logfile file for the compactions in the seconnd test case
(compaction_stage_test02_jira.log)
- source code zip file for version 2.1.5 with additional comment statements
(src_2.1.5_with_debug.zip)
- Python script to generate test data (datagen.py)
- Python script to read metadata from SStables
(cassandra_sstable_metadata_reader.py)
- Python script to generate timeline representation of SSTables
(cassandra_sstable_timespan_graph.py)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)