Thom Valley created CASSANDRA-11977:
---------------------------------------
Summary: Replace stream source selection should be deterministic
Key: CASSANDRA-11977
URL: https://issues.apache.org/jira/browse/CASSANDRA-11977
Project: Cassandra
Issue Type: Improvement
Components: Streaming and Messaging
Environment: 2.1.14 / 42 Nodes / 5 DCs
Reporter: Thom Valley
The current method for dealing with the impact of inter-dc latency on bootstrap
and replace node operations is to turn the the ring_delay and "hope" that
gossip settles appropriately.
Even with a ring_delay of 5 minutes, we are seeing remote DCs being used as
sources for node replacements. We also are seeing a variable number of nodes
being used for different replace operations.
For example, in a multiple replace test run, we have seen a node take as little
as 3 hours (when local dc is utilized) to as much as 9 hours to complete it's
replacement process at ~500GB of data / node on LCS. That's a 3X variation.
Replacing a node (or bootstrapping a new node) should be done in as
deterministic and efficient a manner as possible. Repeatable results at a
given topology / data density are important for operational planning.
This has a significant impact on operational planning in large environments,
especially when globally distributed data centers are in play.
The promise of Cassandra is that it is operationally simple and highly fault
tolerant. How long it takes to recover a node is a critical aspect of
maintaining that fault tolerance and reducing risk.
Remote DC links are also not just slower, but generally more expensive from a
transport cost standpoint and frequently are bandwidth constrained. Using a
remote DC unnecessarily for a bootstrap / repair operation adds risk to other
users of that resource.
Replacing a node and/or bootstrapping a node should ideally:
-Always use the local_dc if it is available
-Stream from as many nodes as possible (to reduce total time to complete)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)