[
https://issues.apache.org/jira/browse/CASSANDRA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksey Yeschenko updated CASSANDRA-9572:
-----------------------------------------
Fix Version/s: (was: 2.1.5)
2.1.x
> DateTieredCompactionStrategy fails to combine SSTables correctly when TTL is
> used.
> ----------------------------------------------------------------------------------
>
> Key: CASSANDRA-9572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9572
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Antti Nissinen
> Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: cassandra_sstable_metadata_reader.py,
> cassandra_sstable_timespan_graph.py, compaction_stage_test01_jira.log,
> compaction_stage_test02_jira.log, datagen.py, explanation_jira.txt,
> motivation_jira.txt, src_2.1.5_with_debug.zip
>
>
> DateTieredCompaction works correctly when data is dumped for a certain time
> period in short SSTables in time manner and then compacted together. However,
> if TTL is applied to the data columns the DTCS fails to compact files
> correctly in timely manner. In our opinion the problem is caused by two
> issues:
> A) During the DateTieredCompaction process the getFullyExpiredSStables is
> called twice. First from the DateTieredCompactionStrategy class and second
> time from the CompactionTask class. On the first time the target is to find
> out fully expired SStables that are not overlapping with any non-fully
> expired SSTables. That works correctly. When the getFullyExpiredSSTables is
> called second time from CompactionTask class the selection of fully expired
> SSTables is modified compared to the first selection.
> B) The minimum timestamp of the new SSTables created by combining together
> fully expired SSTable and files from the most interesting bucket is not
> correct.
> These two issues together cause problems for the DTCS process when it
> combines together SSTables having overlap in time and TTL for the column.
> This is demonstrated by generating test data first without compactions and
> showing the timely distribution of files. When the compaction is enabled the
> DCTS combines files together, but the end result is not something to be
> expected. This is demonstrated in the file motivation_jira.txt
> Attachments contain following material:
> - Motivation_jira.txt: Practical examples how the DTCS behaves with TTL
> - Explanation_jira.txt: gives more details, explains test cases and
> demonstrates the problems in the compaction process
> - Logfile file for the compactions in the first test case
> (compaction_stage_test01_jira.log)
> - Logfile file for the compactions in the seconnd test case
> (compaction_stage_test02_jira.log)
> - source code zip file for version 2.1.5 with additional comment statements
> (src_2.1.5_with_debug.zip)
> - Python script to generate test data (datagen.py)
> - Python script to read metadata from SStables
> (cassandra_sstable_metadata_reader.py)
> - Python script to generate timeline representation of SSTables
> (cassandra_sstable_timespan_graph.py)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)