[
https://issues.apache.org/jira/browse/CASSANDRA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-2841:
----------------------------------------
Attachment: 2841.patch
Patch is against 0.7.
> Always use even distribution for merkle tree with RandomPartitionner
> --------------------------------------------------------------------
>
> Key: CASSANDRA-2841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Trivial
> Labels: repair
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 2841.patch
>
>
> When creating the initial merkle tree, repair tries to be (too) smart and use
> the key samples to "guide" the tree splitting. While this is a good idea for
> OPP where there is a good change the data distribution is uneven, you can't
> beat an even distribution for the RandomPartitionner. And a quick experiment
> even shows that the method used is significantly less efficient than an even
> distribution for the ranges of the merkle tree (that is, an even distribution
> gives a much better of distribution of the number of keys by range of the
> tree).
> Thus let's switch to an even distribution for RandomPartitionner. That 3
> lines change alone amounts for a significant improvement of repair's
> precision.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira