[
https://issues.apache.org/jira/browse/CASSANDRA-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161824#comment-14161824
]
Marcus Eriksson commented on CASSANDRA-8004:
--------------------------------------------
pushed a branch here for this:
https://github.com/krummas/cassandra/commit/476b27dc503c3541ee31dacdd70191fee8a819a5
* Introduces a "WrappingCompactionStrategy" that contains the logic for
handling repaired/unrepaired sstables.
** Could be a bit confusing and should probably be refactored for 3.0 - it
would be nicer with a "CompactionStrategyManager" or similar that does not
extend AbstractCompactionStrategy, but we currently call
cfs.getCompactionStrategy() in many places so having the WCS makes it
transparent to any users.
* As mentioned in the description this makes it possible, for the first run, to
move sstables from the leveling in unrepaired straight over to the
repaired-leveling. After the first run, we try to move sstables over, if it
fails, they are sent to L0.
* keeps 2 instances of the same compaction strategy, changing the compaction
strategy is now handled by WrappingCompactionStrategy.
* The compaction strategies now track which sstables they can run compaction on
(LCS always did this, now STCS does it as well). So the compaction strategy
will only ever see either repaired or unrepaired sstables.
* As mentioned in CASSANDRA-5351 (and the original reason we did STCS on the
unrepaired data) the write amplification gets a lot higher when having 2
parallel levelings, so maybe we should have an option to configure the
different compaction strategies separately - you could configure STCS for the
unrepaired and LCS for the repaired if the write amplification gets too high
for the use case.
* An added benefit of running LCS for the unrepaired data is that it makes each
sstable contain a smaller range - making it more likely that the sstable is
fully contained within the repaired range and the anticompaction step can
simply update the repairedAt timestamp and not have to rewrite the entire
sstable to split out the repaired ranges.
* Also handles the case where someone runs incremental repair once, and then
forgets about it, then all the data would be size tiered in the current
implementation, with this there will be a small/old repaired leveling and a big
unrepaired leveling.
Thoughts, comments?
> Run LCS for both repaired and unrepaired data
> ---------------------------------------------
>
> Key: CASSANDRA-8004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8004
> Project: Cassandra
> Issue Type: Bug
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Labels: compaction
> Fix For: 2.1.1
>
>
> If a user has leveled compaction configured, we should run that for both the
> unrepaired and the repaired data. I think this would make things a lot easier
> for end users
> It would simplify migration to incremental repairs as well, if a user runs
> incremental repair on its nice leveled unrepaired data, we wont need to drop
> it all to L0, instead we can just start moving sstables from the unrepaired
> leveling straight into the repaired leveling
> Idea could be to have two instances of LeveledCompactionStrategy and move
> sstables between the instances after an incremental repair run (and let LCS
> be totally oblivious to whether it handles repaired or unrepaired data). Same
> should probably apply to any compaction strategy, run two instances and
> remove all repaired/unrepaired logic from the strategy itself.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)