Thom Valley created CASSANDRA-11977:
---------------------------------------

             Summary: Replace stream source selection should be deterministic
                 Key: CASSANDRA-11977
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11977
             Project: Cassandra
          Issue Type: Improvement
          Components: Streaming and Messaging
         Environment: 2.1.14 / 42 Nodes / 5 DCs
            Reporter: Thom Valley


The current method for dealing with the impact of inter-dc latency on bootstrap 
and replace node operations is to turn the the ring_delay and "hope" that 
gossip settles appropriately.

Even with a ring_delay of 5 minutes, we are seeing remote DCs being used as 
sources for node replacements.  We also are seeing a variable number of nodes 
being used for different replace operations.

For example, in a multiple replace test run, we have seen a node take as little 
as 3 hours (when local dc is utilized) to as much as 9 hours to complete it's 
replacement process at ~500GB of data / node on LCS.  That's a 3X variation.

Replacing a node (or bootstrapping a new node) should be done in as 
deterministic and efficient a manner as possible.  Repeatable results at a 
given topology / data density are important for operational planning.

This has a significant impact on operational planning in large environments, 
especially when globally distributed data centers are in play.

The promise of Cassandra is that it is operationally simple and highly fault 
tolerant.  How long it takes to recover a node is a critical aspect of 
maintaining that fault tolerance and reducing risk.

Remote DC links are also not just slower, but generally more expensive from a 
transport cost standpoint and frequently are bandwidth constrained.  Using a 
remote DC unnecessarily for a bootstrap / repair operation adds risk to other 
users of that resource.

Replacing a node and/or bootstrapping a node should ideally:
-Always use the local_dc if it is available
-Stream from as many nodes as possible (to reduce total time to complete)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to