[
https://issues.apache.org/jira/browse/CASSANDRA-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210491#comment-17210491
]
Paulo Motta edited comment on CASSANDRA-13701 at 10/8/20, 10:31 PM:
--------------------------------------------------------------------
I was able to improve runtime of vnode dtests by around 50% on my local machine
by [making CCM start nodes in
parallel|https://github.com/pauloricardomg/ccm/commit/3b21db1a46b596c2b4850c076e035b5251d7dc39]
with a new flag {{-Dcassandra.init.wait_for_live_members}}.
[This
flag|https://github.com/pauloricardomg/cassandra/commit/d03956b088e0f408ade607c55182619d593c8519]
makes the node wait until a specified number of nodes is live *and* part of
the ring before proceeding with bootstrap. This ensures the processes are
started in parallel but tokens are assigned sequentially. So the first node is
started with {{-Dcassandra.init.wait_for_live_members=0}}, the second node with
{{-Dcassandra.init.wait_for_live_members=1}}, the third node with
{{-Dcassandra.init.wait_for_live_members=2}} and so on.
A bit hacky but seems to improve runtimes significantly since we can
parallelize a big chunk of the startup time. I'm running this on a very slow
machine so we might get nicer improvements on a better CI machines.
The good news is that on the non-vnode case the tokens are assigned manually
via CCM so we don't need to make nodes start sequentially so the runtimes on
the non-vnode case are unchanged.
[~e.dimitrova] would you (or someone with CI access) mind re-running the tests
above with the branches below to see how the runtimes look with this change?
* [cassandra|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-13701]
*
[dtest|https://github.com/pauloricardomg/cassandra-dtest/tree/CASSANDRA-13701]
* [ccm|https://github.com/pauloricardomg/ccm/tree/CASSANDRA-13701]
(cc [~mck] since this is related to CASSANDRA-16079)
was (Author: pauloricardomg):
I was able to improve runtime of a few vnode dtests by around 50% by [making
CCM start nodes in
parallel|https://github.com/pauloricardomg/ccm/commit/3b21db1a46b596c2b4850c076e035b5251d7dc39]
with a new flag {{-Dcassandra.init.wait_for_live_members}}.
[This
flag|https://github.com/pauloricardomg/cassandra/commit/d03956b088e0f408ade607c55182619d593c8519]
makes the node wait until a specified number of nodes is live *and* part of
the ring before proceeding with bootstrap. This ensures the processes are
started in parallel but tokens are assigned sequentially. So the first node is
started with {{-Dcassandra.init.wait_for_live_members=0}}, the second node with
{{-Dcassandra.init.wait_for_live_members=1}}, the third node with
{{-Dcassandra.init.wait_for_live_members=2}} and so on.
A bit hacky but seems to improve runtimes significantly since we can
parallelize a big chunk of the startup time. I'm running this on a very slow
machine so we might get nicer improvements on a better CI machines.
The good news is that on the non-vnode case the tokens are assigned manually
via CCM so we don't need to make nodes start sequentially so the runtimes on
the non-vnode case are unchanged.
[~e.dimitrova] would you (or someone with CI access) mind re-running the tests
above with the branches below to see how the runtimes look with this change?
* [cassandra|https://github.com/pauloricardomg/cassandra/tree/CASSANDRA-13701]
*
[dtest|https://github.com/pauloricardomg/cassandra-dtest/tree/CASSANDRA-13701]
* [ccm|https://github.com/pauloricardomg/ccm/tree/CASSANDRA-13701]
(cc [~mck] since this is related to CASSANDRA-16079)
> Lower default num_tokens
> ------------------------
>
> Key: CASSANDRA-13701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13701
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Config
> Reporter: Chris Lohfink
> Assignee: Alexander Dejanovski
> Priority: Low
> Fix For: 4.0-alpha
>
>
> For reasons highlighted in CASSANDRA-7032, the high number of vnodes is not
> necessary. It is very expensive for operations processes and scanning. Its
> come up a lot and its pretty standard and known now to always reduce the
> num_tokens within the community. We should just lower the defaults.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]