[
https://issues.apache.org/jira/browse/CASSANDRA-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381991#comment-16381991
]
Kenneth Brotman edited comment on CASSANDRA-14265 at 3/1/18 1:37 PM:
---------------------------------------------------------------------
>From the users group:
Hello, you can always run “nodetool ring” to see all tokens. Hannu
Jeff Jirsa said:
The scenario you describe is the typical point where people move away from
vnodes and towards single-token-per-node (or a much smaller number of vnodes).
The default setting puts you in a situation where virtually all hosts are
adjacent/neighbors to all others (at least until you're way into the hundreds
of hosts), which means you'll stream from nearly all hosts. If you drop the
number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the number of streams
drop as well. Many people with "large" clusters statically allocate tokens to
make it predictable - if you have a single token per host, you can add multiple
hosts at a time, each streaming from a small number of neighbors, without
overlap. It takes a bit more tooling (or manual token calculation) outside of
cassandra, but works well in practice for "large" clusters.
. . .
At a past job, we set the limit at around 60 hosts per cluster - anything
bigger than that got single token. Anything smaller, and we'd just tolerate the
inconveniences of vnodes. But that was before the new vnode token allocation
went into 3.0, and really assumed things that may not be true for you (it was a
cluster that started at 60 hosts and grew up to 480 in steps, so we'd want to
grow quickly - having single token allowed us to grow from 60-120 in 2 days,
and then 120-180 in 2 days, and so on). Are you always going to be growing, or
is it a short/temporary thing? There are users of vnodes (at big, public
companies) that go up into the hundreds of nodes. Most people running cassandra
start sharding clusters rather than going past a thousand or so nodes - I know
there's at least one person I talked to in IRC with a 1700 host cluster, but
that'd be beyond what I'd ever do personally.
*From:* {color:#333333}Kyrylo Lebediev <[email protected]>{color}
*Subject:* {color:#333333}Re: Is it possible / makes it sense to limit
concurrent streaming during bootstrapping new nodes?{color}
*Date:* {color:#333333}2018/02/24 10:26:35{color}
*List:*
[[email protected]|https://lists.apache.org/[email protected]]
!https://lists.apache.org/images/quote.png! * Didn't see this question
answered. I think, be easiest way to do this is to add new C* nodes with lower
vnodes (8, 16 instead of default 256) then decom old nodes with vnodes=256.
Thanks, guys, for shedding some light on this Java multithread-related
scalability issue. BTW how to understand from JVM / OS metrics that number of
threads for a JVM becomes a bottleneck? Also, I'd like to add a comment: the
higher number of vnodes per a node the lower overall reliability of the
cluster. Replicas for a token range are placed on the nodes responsible for
next+1, next+2 ranges (not taking into account NetworkTopologyStrategy / Snitch
which help but seemingly not so much expressing in terms of probabilities). The
higher number of vnodes per a node, the higher probability all nodes in the
cluster will become 'neighbors' in terms of token ranges. It's not a trivial
formula for 'reliability' of C* cluster [haven't got a chance to do math....],
but in general, having a bigger number of nodes in a cluster (like 100 or 200),
probability of 2 or more nodes are down at the same time increases
proportionally the the number of nodes. The most reliable C* setup is using
initial_token instead of vnodes. But this makes manageability of C* cluster
worse [not so automated + there will hotshots in the cluster in some cases].
Remark: for C* cluster with RF=3 any number of nodes and initial_token/vnodes
setup there is always a possibility that simultaneous unavailability of 2(or 3,
depending on which CL is used) nodes will lead to unavailability of a token
range ('HostUnavailable' exception). No miracles: reliability is mostly
determined by RF number. Which task must be solved for large clusters:
"Reliability of a cluster with NNN nodes and RF=3 shouldn't be 'tangibly' less
than reliability of 3-nodes cluster with RF=3" Kind Regards, Kyrill
* Subject: Re: Is it possible / makes it sense to limit concurrent streaming
during bootstrapping new nodes? You can’t migrate down that way. The last
several nodes you have up will get completely overwhelmed, and you’ll be
completely screwed. Please do not give advice like this unless you’ve actually
gone through the process or at least have an understanding of how the data will
be shifted. Adding nodes with 16 tokens while decommissioning the ones with 256
will be absolute hell. You can only do this by adding a new DC and retiring the
old.
!https://lists.apache.org/images/quote.png!- Didn't see this question answered.
I think, be easiest way to do this is to add new C* nodes with lower vnodes (8,
16 instead of default 256) then decom old nodes with vnodes=256. Thanks, guys,
for shedding some light on this Java multithread-related scalability issue. BTW
how to understand from JVM / OS metrics that number of threads for a JVM
becomes a bottleneck? Also, I'd like to add a comment: the higher number of
vnodes per a node the lower overall reliability of the cluster. Replicas for a
token range are placed on the nodes responsible for next+1, next+2 ranges (not
taking into account NetworkTopologyStrategy / Snitch which help but seemingly
not so much expressing in terms of probabilities). The higher number of vnodes
per a node, the higher probability all nodes in the cluster will become
'neighbors' in terms of token ranges. It's not a trivial formula for
'reliability' of C* cluster [haven't got a chance to do math....], but in
general, having a bigger number of nodes in a cluster (like 100 or 200),
probability of 2 or more nodes are down at the same time increases
proportionally the the number of nodes. The most reliable C* setup is using
initial_token instead of vnodes. But this makes manageability of C* cluster
worse [not so automated + there will hotshots in the cluster in some cases].
Remark: for C* cluster with RF=3 any number of nodes and initial_token/vnodes
setup there is always a possibility that simultaneous unavailability of 2(or 3,
depending on which CL is used) nodes will lead to unavailability of a token
range ('HostUnavailable' exception). No miracles: reliability is mostly
determined by RF number. Which task must be solved for large clusters:
"Reliability of a cluster with NNN nodes and RF=3 shouldn't be 'tangibly' less
than reliability of 3-nodes cluster with RF=3" Kind Regards, Kyrill
*
was (Author: kenbrotman):
>From the users group:
Hello, you can always run “nodetool ring” to see all tokens. Hannu
> Add explanation of vNodes to online documentation
> -------------------------------------------------
>
> Key: CASSANDRA-14265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14265
> Project: Cassandra
> Issue Type: Improvement
> Components: Documentation and Website
> Reporter: Kenneth Brotman
> Priority: Major
>
> A lot of inquiries on the mailing list about how vNodes work and how to set
> configuration properly. We should add an explanation to the documentation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]