[
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392912#comment-14392912
]
Benedict commented on CASSANDRA-7032:
-------------------------------------
I've pushed a version of this
[here|https://github.com/belliottsmith/cassandra/tree/7032-nits] which has a
lot of comments and some small refactors performed, that I needed to perform in
order to understand the code.
On the whole it seems like a good approach, and carefully optimised.
I think some further changes are necessary for improved understanding, as the
functions are currently very dense, with a focus on performance even in areas
that won't make a tremendous difference. Algorithmically it's good, but we
should IMO preferably separate the temporary state maintenance and cleanup from
the main goal steps, and avoid inflating every nested property access into its
own variable, as it pollutes the method body and makes it hard to track which
actions are important. Inside {{evaluateImprovement}} the ownership adjustment
calculations should all preferably be split out into separate methods, and if
possible the {{unitsChain}} there and {{prevPopulate}} chain in
{{populateTokenInfo}} should use a class like {{ReplicationVisitor}} that I
introduced to extract some of the dense state maintenance and make the logic
more declarative.
I'm still a little confused by the {{expandable}} property, both because I
don't understand its name, but also because its definition doesn't seem to
match up to how it's used (it's quite likely I'm misunderstanding though, so if
you could consider my code comment, and explain - preferably with clearer
comments around its stated aim)
I think there is a potential small modification to be made as well, which may
or may not be helpful. It bothers me that we can in theory create very
suboptimal chains for establishing the owners of a token, by having a bunching
of groups within the same replication range. Since these should _generally_
lead to worse distribution outcomes this algorithm should generally fight
against them, but it is possible to find problematic local minima. It would be
nice to pre-exclude any candidates that would lead to an increased clumping of
groups. Perhaps even to exclude any tokens that would shadow the ownership
range of a partner (same group), if we have more than RF groups, and otherwise
permitting a maximum traversal of one partner to define the replication range.
> Improve vnode allocation
> ------------------------
>
> Key: CASSANDRA-7032
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Branimir Lambov
> Labels: performance, vnodes
> Fix For: 3.0
>
> Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java,
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java,
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes
> hotspots of ownership. It should be possible to improve dramatically on this
> with deterministic allocation. I have quickly thrown together a simple greedy
> algorithm that allocates vnodes efficiently, and will repair hotspots in a
> randomly allocated cluster gradually as more nodes are added, and also
> ensures that token ranges are fairly evenly spread between nodes (somewhat
> tunably so). The allocation still permits slight discrepancies in ownership,
> but it is bound by the inverse of the size of the cluster (as opposed to
> random allocation, which strangely gets worse as the cluster size increases).
> I'm sure there is a decent dynamic programming solution to this that would be
> even better.
> If on joining the ring a new node were to CAS a shared table where a
> canonical allocation of token ranges lives after running this (or a similar)
> algorithm, we could then get guaranteed bounds on the ownership distribution
> in a cluster. This will also help for CASSANDRA-6696.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)