[ 
https://issues.apache.org/jira/browse/CASSANDRA-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161824#comment-14161824
 ] 

Marcus Eriksson commented on CASSANDRA-8004:
--------------------------------------------

pushed a branch here for this: 
https://github.com/krummas/cassandra/commit/476b27dc503c3541ee31dacdd70191fee8a819a5

* Introduces a "WrappingCompactionStrategy" that contains the logic for 
handling repaired/unrepaired sstables.
** Could be a bit confusing and should probably be refactored for 3.0 - it 
would be nicer with a "CompactionStrategyManager" or similar that does not 
extend AbstractCompactionStrategy, but we currently call 
cfs.getCompactionStrategy() in many places so having the WCS makes it 
transparent to any users.
* As mentioned in the description this makes it possible, for the first run, to 
move sstables from the leveling in unrepaired straight over to the 
repaired-leveling. After the first run, we try to move sstables over, if it 
fails, they are sent to L0.
* keeps 2 instances of the same compaction strategy, changing the compaction 
strategy is now handled by WrappingCompactionStrategy.
* The compaction strategies now track which sstables they can run compaction on 
(LCS always did this, now STCS does it as well). So the compaction strategy 
will only ever see either repaired or unrepaired sstables.
* As mentioned in CASSANDRA-5351 (and the original reason we did STCS on the 
unrepaired data) the write amplification gets a lot higher when having 2 
parallel levelings, so maybe we should have an option to configure the 
different compaction strategies separately - you could configure STCS for the 
unrepaired and LCS for the repaired if the write amplification gets too high 
for the use case.
* An added benefit of running LCS for the unrepaired data is that it makes each 
sstable contain a smaller range - making it more likely that the sstable is 
fully contained within the repaired range and the anticompaction step can 
simply update the repairedAt timestamp and not have to rewrite the entire 
sstable to split out the repaired ranges.
* Also handles the case where someone runs incremental repair once, and then 
forgets about it, then all the data would be size tiered in the current 
implementation, with this there will be a small/old repaired leveling and a big 
unrepaired leveling.

Thoughts, comments?

> Run LCS for both repaired and unrepaired data
> ---------------------------------------------
>
>                 Key: CASSANDRA-8004
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8004
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>              Labels: compaction
>             Fix For: 2.1.1
>
>
> If a user has leveled compaction configured, we should run that for both the 
> unrepaired and the repaired data. I think this would make things a lot easier 
> for end users
> It would simplify migration to incremental repairs as well, if a user runs 
> incremental repair on its nice leveled unrepaired data, we wont need to drop 
> it all to L0, instead we can just start moving sstables from the unrepaired 
> leveling straight into the repaired leveling
> Idea could be to have two instances of LeveledCompactionStrategy and move 
> sstables between the instances after an incremental repair run (and let LCS 
> be totally oblivious to whether it handles repaired or unrepaired data). Same 
> should probably apply to any compaction strategy, run two instances and 
> remove all repaired/unrepaired logic from the strategy itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to