Jay Zhuang created CASSANDRA-15141:
--------------------------------------

             Summary: RemoveNode takes long time and blocks gossip stage
                 Key: CASSANDRA-15141
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
             Project: Cassandra
          Issue Type: Improvement
          Components: Cluster/Gossip, Cluster/Membership
            Reporter: Jay Zhuang
            Assignee: Jay Zhuang


This function 
[{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
 during removenode and decommission is slow for large vnode cluster with 
NetworkTopologyStrategy. As it needs to build whole replications map for every 
token range.
In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
NetworkTopologyStrategy keyspace, so the total time to process a removenode 
message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to