[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380174#comment-14380174
 ] 

Branimir Lambov commented on CASSANDRA-7032:
--------------------------------------------

Patch is up for review 
[here|https://github.com/apache/cassandra/compare/trunk...blambov:7032-vnode-assignment].
 It gives the option to specify a "allocate_tokens_keyspace" when bringing up a 
node. The node's tokens are then allocated to optimize the load distribution 
for the replication strategy of that keyspace.

The allocation is currently restricted to Murmur3Partitioner and SimpleStrategy 
or NetworkTopologyStrategy (is there anything else we need to support?). With 
the latter it cannot deal with cases where the number of racks in the dc is 
more than one but less than the replication factor, which should not be a 
common case.

There are a couple of things still left to do or explore, possibly in separate 
patches:
- add a dtest starting several nodes with allocation
- run a cstar_perf to see if it could show improvement for RF 2 in a 3-node 
cluster
- optimization of the selection for the first RF nodes in the cluster to 
guarantee good distribution later (see 
ReplicationAwareTokenAllocator.testNewCluster)
- (if deemed worthwhile) multiple different replication factors in one 
datacentre; the current code works ok when asked to allocate alternatingly but 
this could be improved if we consider all relevant strategies in parallel

> Improve vnode allocation
> ------------------------
>
>                 Key: CASSANDRA-7032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>              Labels: performance, vnodes
>             Fix For: 3.0
>
>         Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java, 
> TestVNodeAllocation.java
>
>
> It's been known for a little while that random vnode allocation causes 
> hotspots of ownership. It should be possible to improve dramatically on this 
> with deterministic allocation. I have quickly thrown together a simple greedy 
> algorithm that allocates vnodes efficiently, and will repair hotspots in a 
> randomly allocated cluster gradually as more nodes are added, and also 
> ensures that token ranges are fairly evenly spread between nodes (somewhat 
> tunably so). The allocation still permits slight discrepancies in ownership, 
> but it is bound by the inverse of the size of the cluster (as opposed to 
> random allocation, which strangely gets worse as the cluster size increases). 
> I'm sure there is a decent dynamic programming solution to this that would be 
> even better.
> If on joining the ring a new node were to CAS a shared table where a 
> canonical allocation of token ranges lives after running this (or a similar) 
> algorithm, we could then get guaranteed bounds on the ownership distribution 
> in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to