[
https://issues.apache.org/jira/browse/CASSANDRA-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970189#action_12970189
]
Jonathan Ellis commented on CASSANDRA-1083:
-------------------------------------------
I think this approach wastes a lot more effort than the current system, because
once it has been going a while you see this:
{code}
1 1 1 1 121 125 125 125 125 125 125 125
Compacting (ages): 64 3 2 1 0
125 125 125 125 125 125 125 125
{code}
in other words, each time we do a compaction, the common case is for it to
compact the most recent small ones with a large one, meaning 95% of the work
done is just re-copying the large.
> Improvement to CompactionManger's submitMinorIfNeeded
> -----------------------------------------------------
>
> Key: CASSANDRA-1083
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1083
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ryan King
> Assignee: Tyler Hobbs
> Priority: Minor
> Fix For: 0.7.1
>
> Attachments: 1083-configurable-compaction-thresholds.patch,
> compaction_simulation.rb, compaction_simulation.rb
>
>
> We've discovered that we are unable to tune compaction the way we want for
> our production cluster. I think the current algorithm doesn't do this as well
> as it could, since it doesn't sort the sstables by size before doing the
> bucketing, which means the tuning parameters have unpredictable results.
> I looked at CASSANDRA-792, but it seems like overkill. Here's an alternative
> proposal:
> config operations:
> minimumCompactionThreshold
> maximumCompactionThreshold
> targetSSTableCount
> The first two would mean what they currently mean: the bounds on how many
> sstables to compact in one compaction operation. The 3rd is a target for how
> many SSTables you'd like to have.
> Pseudo code algorithm for determining whether or not to do a minor compaction:
> {noformat}
> if sstables.length + minimumCompactionThreshold -1 > targetSSTableCount
> sort sstables from smallest to largest
> compact the up to maximumCompactionThreshold smallest tables
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.