[
https://issues.apache.org/jira/browse/CASSANDRA-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13275892#comment-13275892
]
Sam Overton commented on CASSANDRA-3881:
----------------------------------------
First the important bit: With these patches,
StorageService.calculatePendingRanges is almost three orders of magnitude
faster when calculating two nodes bootstrapping into a cluster with 2048 nodes
(22ms vs 14.6sec). See the [graph
here|https://github.com/acunu/cassandra/wiki/images/calculate_pending_ranges.png].
This was tested with 1 DC and 1 rack with RF=2.
The problem lies in NetworkTopologyStrategy.calculateNaturalEndpoints. The main
problems with the existing implementation are:
1. for each datacentre:
2. it iterates through all the tokens in the ring at least once
3. then does an NlogN sort of those tokens
4. then if number of racks < RF it will iterate through all tokens again
because it can't tell if it has exhausted the racks in that DC
5. then if number of hosts in that DC < RF it will iterate through all tokens
again, otherwise it will iterate through until it has RF hosts in that DC
so it's doing O(DC * (N + NlogN + N + N)) operations just to work out the
endpoints for a single token. StorageService.calculatePendingRanges then puts
this inside other loops (such as AbstractReplicationStrategy.getAddressRanges)
which makes it at least O(N^2*logN).
These patches fix (1) by iterating through the tokens only once, and processing
all DCs simultaneously.
(2,3&5) relate to knowing which endpoints exist in a given DC, (4) relates to
knowing which racks appear in a DC, so the first patch adds this knowledge to
the snitch. The second patch makes use of this knowledge to simplify
calculateNaturalEndpoints.
||branch|| || ||description
|p/3881/00_snitch_topology|[compare|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology]|[raw
diff|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/00_snitch_topology...p/3881/00_snitch_topology.diff]|Adds
some functionality to AbstractEndpointSnitch to track which endpoints and
racks exist in a DC (allows for fixing of 2-5).
|p/3881/01_calc_natural_endpoints|[compare|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints]|[raw
diff|https://github.com/acunu/cassandra/compare/refs/top-bases/p/3881/01_calc_natural_endpoints...p/3881/01_calc_natural_endpoints.diff]|Rewritten
O(logN) implementation of calculateNaturalEndpoints using the topology
information from the snitch.
Note: These branches are managed with [TopGit|http://repo.or.cz/w/topgit.git].
If you are applying the patch output manually, you will either need to filter
the TopGit metadata files ( {{wget -O - <url> | filterdiff -x*.topdeps
-x*.topmsg | patch -p1}} ), or remove them afterward ( {{rm .topmsg .topdeps}}
).
> reduce computational complexity of processing topology changes
> --------------------------------------------------------------
>
> Key: CASSANDRA-3881
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3881
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Peter Schuller
> Assignee: Sam Overton
>
> This constitutes follow-up work from CASSANDRA-3831 where a partial
> improvement was committed, but the fundamental issue was not fixed. The
> maximum "practical" cluster size was significantly improved, but further work
> is expected to be necessary as cluster sizes grow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira