[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232857#comment-14232857 ]
Benedict edited comment on CASSANDRA-7032 at 12/3/14 10:38 AM: --------------------------------------------------------------- Well, NetworkTopologyStrategy already enforces some degree of balance across racks, and absolutely guarantees balance across DCs as far as replication ownership is concerned. It _would_ be nice to migrate this behaviour to the token selection so that we could reason about ownership a bit more clearly (NTS might enforce our general ownership constraints, but having a predictably cheap generation strategy for end points would be great, as the amount of state necessary to route queries could shrink dramatically. if we could rely on a sequence of adjacent tokens ensuring these properties, for instance), but a simpler goal of simply ensuring that for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon of perfect, should be more than sufficient. TL;DR; our goal should probably be: "for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon* of perfect" \* with epsilon probably inversely proportional to the size of the slice was (Author: benedict): Well, NetworkTopologyStrategy already enforces some degree of balance across racks, and absolutely guarantees balance across DCs as far as replication ownership is concerned. It _would_ be nice to migrate this behaviour to the token selection so that we could reason about ownership a bit more clearly (NTS might enforce our general ownership constraints, but having a predictably cheap generation strategy for end points would be great, as the amount of state necessary to route queries could shrink dramatically. if we could rely on a sequence of adjacent tokens ensuring these properties, for instance), but a simpler goal of simply ensuring that for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon of perfect, should be more than sufficient. TL;DR; our goal should probably be: "for any given arbitrary slice of the global token range, all nodes have a share of the range that is within epsilon* of perfect" * with epsilon probably inversely proportional to the size of the slice > Improve vnode allocation > ------------------------ > > Key: CASSANDRA-7032 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Branimir Lambov > Labels: performance, vnodes > Fix For: 3.0 > > Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java > > > It's been known for a little while that random vnode allocation causes > hotspots of ownership. It should be possible to improve dramatically on this > with deterministic allocation. I have quickly thrown together a simple greedy > algorithm that allocates vnodes efficiently, and will repair hotspots in a > randomly allocated cluster gradually as more nodes are added, and also > ensures that token ranges are fairly evenly spread between nodes (somewhat > tunably so). The allocation still permits slight discrepancies in ownership, > but it is bound by the inverse of the size of the cluster (as opposed to > random allocation, which strangely gets worse as the cluster size increases). > I'm sure there is a decent dynamic programming solution to this that would be > even better. > If on joining the ring a new node were to CAS a shared table where a > canonical allocation of token ranges lives after running this (or a similar) > algorithm, we could then get guaranteed bounds on the ownership distribution > in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)