[ https://issues.apache.org/jira/browse/CASSANDRA-18093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643413#comment-17643413 ]
Branimir Lambov edited comment on CASSANDRA-18093 at 12/5/22 3:35 PM: ---------------------------------------------------------------------- This is an example of the allocation {{generatetokens}} is currently constructing for the second nodes in each rack: {code:java} Replicated node load in rack=0 after allocating node 3: max 1.33 min 0.56 stddev 0.0370. Replicated node load in rack=1 after allocating node 4: max 1.67 min 0.70 stddev 0.0288. Replicated node load in rack=2 after allocating node 5: max 1.16 min 0.84 stddev 0.0064. {code} and this is what it should be {code:java} Replicated node load in rack=0 after allocating node 3: max 1.06 min 0.94 stddev 0.0052. Replicated node load in rack=1 after allocating node 4: max 1.06 min 0.94 stddev 0.0052. Replicated node load in rack=2 after allocating node 5: max 1.06 min 0.94 stddev 0.0052. {code} The second result was generated after applying the following patch: {code:java} diff --git a/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java b/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java index 184332907b..84ccb95ff4 100644 --- a/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java +++ b/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java @@ -239,7 +239,8 @@ public class TokenAllocation private StrategyAdapter getOrCreateStrategy(String dc, String rack) { - return strategyByRackDc.computeIfAbsent(dc, k -> new HashMap<>()).computeIfAbsent(rack, k -> createStrategy(dc, rack)); + return createStrategy(dc, rack); +// return strategyByRackDc.computeIfAbsent(dc, k -> new HashMap<>()).computeIfAbsent(rack, k -> createStrategy(dc, rack)); } private StrategyAdapter createStrategy(String dc, String rack) @@ -281,8 +282,11 @@ public class TokenAllocation // group by rack return createStrategy(strategy.snitch, dc, null, replicas, true); } - else if (racks == 1) + else if (racks == 1 || topology.getDatacenterEndpoints().get(dc).size() < replicas) { + // One rack, each node treated as separate. + // This is also used as a fallback when number of nodes is lower than the replication factor + // (where allocation is done randomly). return createStrategy(strategy.snitch, dc, null, replicas, false); } {code} was (Author: blambov): This is an example of the allocation {{generatetokens}} is currently constructing for the second nodes in a rack: {code}Replicated node load in rack=0 after allocating node 3: max 1.33 min 0.56 stddev 0.0370. Replicated node load in rack=1 after allocating node 4: max 1.67 min 0.70 stddev 0.0288. Replicated node load in rack=2 after allocating node 5: max 1.16 min 0.84 stddev 0.0064. {code} and this is what it should be {code}Replicated node load in rack=0 after allocating node 3: max 1.06 min 0.94 stddev 0.0052. Replicated node load in rack=1 after allocating node 4: max 1.06 min 0.94 stddev 0.0052. Replicated node load in rack=2 after allocating node 5: max 1.06 min 0.94 stddev 0.0052. {code} The second result was generated after applying this patch:{code}diff --git a/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java b/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java index 184332907b..84ccb95ff4 100644 --- a/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java +++ b/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java @@ -239,7 +239,8 @@ public class TokenAllocation private StrategyAdapter getOrCreateStrategy(String dc, String rack) { - return strategyByRackDc.computeIfAbsent(dc, k -> new HashMap<>()).computeIfAbsent(rack, k -> createStrategy(dc, rack)); + return createStrategy(dc, rack); +// return strategyByRackDc.computeIfAbsent(dc, k -> new HashMap<>()).computeIfAbsent(rack, k -> createStrategy(dc, rack)); } private StrategyAdapter createStrategy(String dc, String rack) @@ -281,8 +282,11 @@ public class TokenAllocation // group by rack return createStrategy(strategy.snitch, dc, null, replicas, true); } - else if (racks == 1) + else if (racks == 1 || topology.getDatacenterEndpoints().get(dc).size() < replicas) { + // One rack, each node treated as separate. + // This is also used as a fallback when number of nodes is lower than the replication factor + // (where allocation is done randomly). return createStrategy(strategy.snitch, dc, null, replicas, false); } {code} > `generatetokens` uses the wrong allocation strategy > --------------------------------------------------- > > Key: CASSANDRA-18093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18093 > Project: Cassandra > Issue Type: Bug > Reporter: Branimir Lambov > Priority: Normal > > When the number of racks is the same as the replication factor, token > allocation should use `NoReplicationTokenAllocator`. Because `generatetokens` > caches the allocation strategy > [here|https://github.com/apache/cassandra/commit/2346ed8241022882e77433e283ab8ce3004d12b0#diff-a93e5fa817f4b6d64484e0c28391b76e1760f51860de7f4f036470766ff5090cR225], > it uses one that was created before all the racks are defined and does not > switch to the correct one. > This has the effect of constructing very poor allocations for the racks = RF > case, especially for the first couple of nodes per rack. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org