[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

Benedict (JIRA) Thu, 02 Apr 2015 09:30:27 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392912#comment-14392912
 ]


Benedict commented on CASSANDRA-7032:
-------------------------------------

I've pushed a version of this 
[here|https://github.com/belliottsmith/cassandra/tree/7032-nits] which has a 
lot of comments and some small refactors performed, that I needed to perform in 
order to understand the code.

On the whole it seems like a good approach, and carefully optimised.

I think some further changes are necessary for improved understanding, as the 
functions are currently very dense, with a focus on performance even in areas 
that won't make a tremendous difference. Algorithmically it's good, but we 
should IMO preferably separate the temporary state maintenance and cleanup from 
the main goal steps, and avoid inflating every nested property access into its 
own variable, as it pollutes the method body and makes it hard to track which 
actions are important. Inside {{evaluateImprovement}} the ownership adjustment 
calculations should all preferably be split out into separate methods, and if 
possible the {{unitsChain}} there and {{prevPopulate}} chain in 
{{populateTokenInfo}} should use a class like {{ReplicationVisitor}} that I 
introduced to extract some of the dense state maintenance and make the logic 
more declarative.

I'm still a little confused by the {{expandable}} property, both because I 
don't understand its name, but also because its definition doesn't seem to 
match up to how it's used (it's quite likely I'm misunderstanding though, so if 
you could consider my code comment, and explain - preferably with clearer 
comments around its stated aim)

I think there is a potential small modification to be made as well, which may 
or may not be helpful. It bothers me that we can in theory create very 
suboptimal chains for establishing the owners of a token, by having a bunching 
of groups within the same replication range. Since these should _generally_ 
lead to worse distribution outcomes this algorithm should generally fight 
against them, but it is possible to find problematic local minima. It would be 
nice to pre-exclude any candidates that would lead to an increased clumping of 
groups. Perhaps even to exclude any tokens that would shadow the ownership 
range of a partner (same group), if we have more than RF groups, and otherwise 
permitting a maximum traversal of one partner to define the replication range.

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.0
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

Reply via email to