[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231496#comment-14231496
 ] 

Branimir Lambov commented on CASSANDRA-7032:
--------------------------------------------

Per-disk balance is just another level in the hierarchy. Ideally we would like 
per-disk, per-node, per-rack and per-datacentre balance (configurable by number 
of vnodes), wouldn't we? Presumably with highest emphasis on the lower levels.

Ignoring replica selection, this all comes for free if we can ensure equal 
vnode size (e.g. by reassigning all tokens on adding a node). With reassignment 
it should also be trivial to build the network topology into the token 
assignment.

As I see it there are two separate objectives:
- to build clusters incrementally by introducing and maintaining _some_ 
imbalance in the ring, which can be exploited to avoid reassignment.
- to improve the balance in existing, probably highly unbalanced clusters, 
built without the above algorithm in mind.

The former might be a solution to the latter, but it is not necessary that it 
is. In any case I intend to look at it in isolation first and then think how it 
would apply to existing clusters.

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.0
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to