[jira] [Comment Edited] (CASSANDRA-14265) Add explanation of vNodes to online documentation

Kenneth Brotman (JIRA) Thu, 01 Mar 2018 05:38:47 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381991#comment-16381991
 ]


Kenneth Brotman edited comment on CASSANDRA-14265 at 3/1/18 1:37 PM:
---------------------------------------------------------------------

>From the users group:

Hello, you can always run “nodetool ring” to see all tokens. Hannu

 

Jeff Jirsa said:

The scenario you describe is the typical point where people move away from 
vnodes and towards single-token-per-node (or a much smaller number of vnodes). 
The default setting puts you in a situation where virtually all hosts are 
adjacent/neighbors to all others (at least until you're way into the hundreds 
of hosts), which means you'll stream from nearly all hosts. If you drop the 
number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the number of streams 
drop as well. Many people with "large" clusters statically allocate tokens to 
make it predictable - if you have a single token per host, you can add multiple 
hosts at a time, each streaming from a small number of neighbors, without 
overlap. It takes a bit more tooling (or manual token calculation) outside of 
cassandra, but works well in practice for "large" clusters.

. . .

At a past job, we set the limit at around 60 hosts per cluster - anything 
bigger than that got single token. Anything smaller, and we'd just tolerate the 
inconveniences of vnodes. But that was before the new vnode token allocation 
went into 3.0, and really assumed things that may not be true for you (it was a 
cluster that started at 60 hosts and grew up to 480 in steps, so we'd want to 
grow quickly - having single token allowed us to grow from 60-120 in 2 days, 
and then 120-180 in 2 days, and so on). Are you always going to be growing, or 
is it a short/temporary thing? There are users of vnodes (at big, public 
companies) that go up into the hundreds of nodes. Most people running cassandra 
start sharding clusters rather than going past a thousand or so nodes - I know 
there's at least one person I talked to in IRC with a 1700 host cluster, but 
that'd be beyond what I'd ever do personally.

*From:* {color:#333333}Kyrylo Lebediev <[email protected]>{color}
*Subject:* {color:#333333}Re: Is it possible / makes it sense to limit 
concurrent streaming during bootstrapping new nodes?{color}
*Date:* {color:#333333}2018/02/24 10:26:35{color}
*List:* 
[[email protected]|https://lists.apache.org/[email protected]]
!https://lists.apache.org/images/quote.png! * Didn't see this question 
answered. I think, be easiest way to do this is to add new C* nodes with lower 
vnodes (8, 16 instead of default 256) then decom old nodes with vnodes=256. 
Thanks, guys, for shedding some light on this Java multithread-related 
scalability issue. BTW how to understand from JVM / OS metrics that number of 
threads for a JVM becomes a bottleneck? Also, I'd like to add a comment: the 
higher number of vnodes per a node the lower overall reliability of the 
cluster. Replicas for a token range are placed on the nodes responsible for 
next+1, next+2 ranges (not taking into account NetworkTopologyStrategy / Snitch 
which help but seemingly not so much expressing in terms of probabilities). The 
higher number of vnodes per a node, the higher probability all nodes in the 
cluster will become 'neighbors' in terms of token ranges. It's not a trivial 
formula for 'reliability' of C* cluster [haven't got a chance to do math....], 
but in general, having a bigger number of nodes in a cluster (like 100 or 200), 
probability of 2 or more nodes are down at the same time increases 
proportionally the the number of nodes. The most reliable C* setup is using 
initial_token instead of vnodes. But this makes manageability of C* cluster 
worse [not so automated + there will hotshots in the cluster in some cases]. 
Remark: for C* cluster with RF=3 any number of nodes and initial_token/vnodes 
setup there is always a possibility that simultaneous unavailability of 2(or 3, 
depending on which CL is used) nodes will lead to unavailability of a token 
range ('HostUnavailable' exception). No miracles: reliability is mostly 
determined by RF number. Which task must be solved for large clusters: 
"Reliability of a cluster with NNN nodes and RF=3 shouldn't be 'tangibly' less 
than reliability of 3-nodes cluster with RF=3" Kind Regards, Kyrill
 * Subject: Re: Is it possible / makes it sense to limit concurrent streaming 
during bootstrapping new nodes? You can’t migrate down that way. The last 
several nodes you have up will get completely overwhelmed, and you’ll be 
completely screwed. Please do not give advice like this unless you’ve actually 
gone through the process or at least have an understanding of how the data will 
be shifted. Adding nodes with 16 tokens while decommissioning the ones with 256 
will be absolute hell. You can only do this by adding a new DC and retiring the 
old.
!https://lists.apache.org/images/quote.png!- Didn't see this question answered. 
I think, be easiest way to do this is to add new C* nodes with lower vnodes (8, 
16 instead of default 256) then decom old nodes with vnodes=256. Thanks, guys, 
for shedding some light on this Java multithread-related scalability issue. BTW 
how to understand from JVM / OS metrics that number of threads for a JVM 
becomes a bottleneck? Also, I'd like to add a comment: the higher number of 
vnodes per a node the lower overall reliability of the cluster. Replicas for a 
token range are placed on the nodes responsible for next+1, next+2 ranges (not 
taking into account NetworkTopologyStrategy / Snitch which help but seemingly 
not so much expressing in terms of probabilities). The higher number of vnodes 
per a node, the higher probability all nodes in the cluster will become 
'neighbors' in terms of token ranges. It's not a trivial formula for 
'reliability' of C* cluster [haven't got a chance to do math....], but in 
general, having a bigger number of nodes in a cluster (like 100 or 200), 
probability of 2 or more nodes are down at the same time increases 
proportionally the the number of nodes. The most reliable C* setup is using 
initial_token instead of vnodes. But this makes manageability of C* cluster 
worse [not so automated + there will hotshots in the cluster in some cases]. 
Remark: for C* cluster with RF=3 any number of nodes and initial_token/vnodes 
setup there is always a possibility that simultaneous unavailability of 2(or 3, 
depending on which CL is used) nodes will lead to unavailability of a token 
range ('HostUnavailable' exception). No miracles: reliability is mostly 
determined by RF number. Which task must be solved for large clusters: 
"Reliability of a cluster with NNN nodes and RF=3 shouldn't be 'tangibly' less 
than reliability of 3-nodes cluster with RF=3" Kind Regards, Kyrill
 * 

 

 


was (Author: kenbrotman):
>From the users group:

Hello, you can always run “nodetool ring” to see all tokens. Hannu

> Add explanation of vNodes to online documentation
> -------------------------------------------------
>
>                 Key: CASSANDRA-14265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14265
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation and Website
>            Reporter: Kenneth Brotman
>            Priority: Major
>
> A lot of inquiries on the mailing list about how vNodes work and how to set 
> configuration properly.  We should add an explanation to the documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14265) Add explanation of vNodes to online documentation

Reply via email to