[
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117032#comment-13117032
]
Alan Liang commented on CASSANDRA-2735:
---------------------------------------
We've tested this patch internally and we noticed that this actually resulted
in a lot more compactions than the SizeTieredCompactionStrategy. The increase
in IO was not acceptable for our use and therefore stopped working on this
patch.
Internally, we ended up implementing expiration of sstables within
SizeTieredCompactionStrategy. We've called it
SizeTieredExpirableCompactionStrategy. Given a set of all sstables, the
compaction procedure becomes:
1. Expire sstables based on max timestamp of the sstable. Remove expired
sstables from the set.
2. Remove sstables from the set that are >= to a max size
3. Run the SizeTieredCompactionStrategy on the remaining sstables.
The downside with this strategy is that during compaction, newer sstables could
be mixed with older sstables and the resultant compacted sstable gets marked
with a max timestamp of the newer sstable. This means you won't be able to
expire the older rows within the sstable until the entire sstable is to be
expired. This problem of compacting really old sstables with newer sstables is
mitigated with a restriction that an sstable is taken out of consideration for
compaction if it reaches a certain max sstable size. This works because older
sstables tend to be larger files.
We found this is currently working for our specific use case of storing
timeseries data. I can post the patch for this
SizeTieredExpirableCompactionStrategy if there is interest. I'll have to rebase
it.
> Timestamp Based Compaction Strategy
> -----------------------------------
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Alan Liang
> Assignee: Alan Liang
> Priority: Minor
> Labels: compaction
> Attachments: 0001-timestamp-bucketed-compaction-strategy-V2.patch,
> 0001-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the
> sstables while satisfying max sstable size, min and max compaction
> thresholds. It also handles expiration of sstables based on a timestamp.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira