[ 
https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-3810:
----------------------------------------
    Component/s: Core

> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing 
> a rack only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you 
> aren't careful is that you induce hotspots as a result of rack aware replica 
> selection. Using the format {{rackname-nodename}}, consider a part of the 
> ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data 
> whose primary replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they 
> would all be forced to skip over any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must 
> not have another node in {{r}} within {{rf-1}} steps in the ring in either 
> direction.
> Any violation of this criteria implies the induction of hotspots due to rack 
> awareness.
> The realization however, that I had a few days ago, is that *the 
> rackawareness is not actually changing replica placement* when using this 
> ring topology. In other words, *the way you have to use* rack awareness is to 
> construct the ring such that *the rack awareness is a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used 
> ("used" in the sense that it actually changes the placement relative to 
> non-awareness) effectively without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation 
> for example) to rely on rack awareness in their rings instead of just giving 
> them the rule above for ring topology?
> * Would it be a better service to the user to provide an easy way to *ensure* 
> that the ring topology adheres to this criteria (such as refusing to 
> bootstrap a new node if rack awareness is requested, and taking it into 
> consideration on automatic token selection (does anyone use that?)), than to 
> "silently" generate hotspots by altering the replication strategy? (The 
> "silence" problem is magnified by the fact that {{nodetool ring}} doesn't 
> reflect this; so the user must take into account both the RF *and* the racks 
> when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a 
> separate tool which will print the *actual* ownership percentage of a node in 
> the ring (based on the thrift {{describe_ring}} call). Any ring that has node 
> selections that causes a violation of the criteria is effectively a 
> bug/mis-configured ring, so only in the event of mistakes are we "using" the 
> rack awareness (using the definition of "use" above).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to