[ 
https://issues.apache.org/jira/browse/CASSANDRA-19633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846275#comment-17846275
 ] 

Marcus Eriksson commented on CASSANDRA-19633:
---------------------------------------------

I've started looking at this and locally it produces a similar flame graph if I 
have a 200 node cluster, with 128 tokens/node and a keyspace with a large RF, I 
wrote up a temporary workaround patch to avoid doing the optimisation when on 
vnodes: 
https://github.com/krummas/cassandra/commit/70e0d9972a94fa63fa991315412e79ef033561b2

But as I looked in to this I think the optimisation from CASSANDRA-4650 was 
broken by CASSANDRA-14405 (transient replication) - before that we made the 
4650-optimisation based on all sources available for a range [1], now we only 
supply a single source per range we need to stream [2]

I'll try to find some time to look in to this soon.

[1] 
https://github.com/apache/cassandra/blob/bf911cc6a852f9ef068318a3545611d9daa5112c/src/java/org/apache/cassandra/dht/RangeStreamer.java#L189-L197
[2] 
https://github.com/apache/cassandra/blob/6bae4f76fb043b4c3a3886178b5650b280e9a50b/src/java/org/apache/cassandra/dht/RangeStreamer.java#L531

> Replaced node is stuck in a loop calculating ranges
> ---------------------------------------------------
>
>                 Key: CASSANDRA-19633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19633
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Bootstrap and Decommission
>            Reporter: Jai Bheemsen Rao Dhanwada
>            Assignee: Marcus Eriksson
>            Priority: Normal
>              Labels: Bootstrap
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.1-alpha1
>
>         Attachments: result1.html
>
>
> Hello,
>  
> I am running into an issue where in a node that is replacing a dead 
> (non-seed) node is stuck in calculating ranges forever. It eventually 
> succeeds, however the time taken for calculating the ranges is not constant. 
> I do sometimes see that it takes 24 hours to calculate ranges for each 
> keyspace. Attached the flume graph of the cassandra process during this time, 
> which points to the below code. 
> {code:java}
> Multimap<InetAddressAndPort, Range<Token>> 
> getRangeFetchMapForNonTrivialRanges()
> {
> //Get the graph with edges between ranges and their source endpoints
> MutableCapacityGraph<Vertex, Integer> graph = getGraph();
> //Add source and destination vertex and edges
> addSourceAndDestination(graph, getDestinationLinkCapacity(graph));
> int flow = 0;
> MaximumFlowAlgorithmResult<Integer, CapacityEdge<Vertex, Integer>> result = 
> null;
> //We might not be working on all ranges
> while (flow < getTotalRangeVertices(graph))
> {
> if (flow > 0)
> { //We could not find a path with previous graph. Bump the capacity b/w 
> endpoint vertices and destination by 1 incrementCapacity(graph, 1); }
> MaximumFlowAlgorithm fordFulkerson = 
> FordFulkersonAlgorithm.getInstance(DFSPathFinder.getInstance());
> result = fordFulkerson.calc(graph, sourceVertex, destinationVertex, 
> IntegerNumberSystem.getInstance());
> int newFlow = result.calcTotalFlow();
> assert newFlow > flow; //We are not making progress which should not happen
> flow = newFlow;
> }
> return getRangeFetchMapFromGraphResult(graph, result);
> }
> {code}
> Digging through the logs, I see the below log line for a given keyspace 
> `system_auth`
> {code:java}
> INFO [main] 2024-05-10 17:35:02,489 RangeStreamer.java:330 - Bootstrap: range 
> Full(/10.135.56.214:7000,(5080189126057290696,5081324396311791613]) exists on 
> Full(/10.135.56.157:7000,(5080189126057290696,5081324396311791613]) for 
> keyspace system_auth{code}
> corresponding code:
> {code:java}
> for (Map.Entry<Replica, Replica> entry : fetchMap.flattenEntries())
> logger.info("{}: range {} exists on {} for keyspace {}", description, 
> entry.getKey(), entry.getValue(), keyspaceName);{code}
> BUT do not see the below line for the corresponding keyspace
> {code:java}
> RangeStreamer.java:606 - Output from RangeFetchMapCalculator for 
> keyspace{code}
> this means the code it's stuck in `getRangeFetchMap();`
> {code:java}
> Multimap<InetAddressAndPort, Range<Token>> rangeFetchMapMap = 
> calculator.getRangeFetchMap();
> logger.info("Output from RangeFetchMapCalculator for keyspace {}", 
> keyspace);{code}
> Here is the cluster topology:
>  * Cassandra version: 4.0.12
>  * # of nodes: 190
>  * Tokens (vnodes): 128
> Initial hypothesis was that the graph calculation was taking longer due to 
> the combination of nodes + tokens + tables but in the same cluster I see one 
> of the node joined without any issues. 
> wondering if I am hitting a bug causing it to  work sometimes but get into an 
> infinite loop some times?
> Please let me know if you need any other details and appreciate any pointers 
> to debug this further.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to