[ 
https://issues.apache.org/jira/browse/CASSANDRA-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13275892#comment-13275892
 ] 

Sam Overton commented on CASSANDRA-3881:
----------------------------------------

First the important bit: With these patches, 
StorageService.calculatePendingRanges is almost three orders of magnitude 
faster when calculating two nodes bootstrapping into a cluster with 2048 nodes 
(22ms vs 14.6sec). See the [graph 
here|https://github.com/acunu/cassandra/wiki/images/calculate_pending_ranges.png].
 This was tested with 1 DC and 1 rack with RF=2.

The problem lies in NetworkTopologyStrategy.calculateNaturalEndpoints. The main 
problems with the existing implementation are:

1. for each datacentre:
2. it iterates through all the tokens in the ring at least once
3. then does an NlogN sort of those tokens
4. then if number of racks < RF it will iterate through all tokens again 
because it can't tell if it has exhausted the racks in that DC
5. then if number of hosts in that DC < RF it will iterate through all tokens 
again, otherwise it will iterate through until it has RF hosts in that DC

so it's doing O(DC * (N + NlogN + N + N)) operations just to work out the 
endpoints for a single token. StorageService.calculatePendingRanges then puts 
this inside other loops (such as AbstractReplicationStrategy.getAddressRanges) 
which makes it at least O(N^2*logN).

These patches fix (1) by iterating through the tokens only once, and processing 
all DCs simultaneously.

(2,3&5) relate to knowing which endpoints exist in a given DC, (4) relates to 
knowing which racks appear in a DC, so the first patch adds this knowledge to 
the snitch. The second patch makes use of this knowledge to simplify 
calculateNaturalEndpoints.

||branch|| || ||description
|p/3881/00_snitch_topology|[compare|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology]|[raw
 
diff|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology.diff]|Adds
 some functionality to AbstractEndpointSnitch to track which endpoints and 
racks exist in a DC (allows for fixing of 2-5).
|p/3881/01_calc_natural_endpoints|[compare|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints]|[raw
 
diff|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints.diff]|Rewritten
 O(logN) implementation of calculateNaturalEndpoints using the topology 
information from the snitch.


Note: These branches are managed with [TopGit|http://repo.or.cz/w/topgit.git].  
If you are applying the patch output manually, you will either need to filter 
the TopGit metadata files ( {{wget -O - <url> | filterdiff -x*.topdeps 
-x*.topmsg | patch -p1}} ), or remove them afterward ( {{rm .topmsg .topdeps}} 
).
                
> reduce computational complexity of processing topology changes
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-3881
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3881
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Sam Overton
>
> This constitutes follow-up work from CASSANDRA-3831 where a partial 
> improvement was committed, but the fundamental issue was not fixed. The 
> maximum "practical" cluster size was significantly improved, but further work 
> is expected to be necessary as cluster sizes grow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to