[ https://issues.apache.org/jira/browse/CASSANDRA-20774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuqi Yan updated CASSANDRA-20774: --------------------------------- Description: Running Cassandra 4.1.3. After switching to PaxosV2 for one of our instances, we started to see that, when a new node attempted to join the ring, multiple nodes within the cluster started to have spike in CPU utils. I collected the CPU profile on one of them and seeing: !image-2025-07-17-16-58-17-456.png|width=1335,height=715! So after switching to V2, new node boostrapping will trigger `repairPaxosForTopologyChange` which will schedule `PaxosCleanup` table by table. Seems in `isOutOfRange` we're doing some unnecessary calculation to compute the token map for the whole cluster with `getAddressReplicas()` - then only use the local range. {code:java} localRanges = Range.normalize(keyspace.getReplicationStrategy() .getAddressReplicas() ^^^ this build the map for the entire cluster .get(FBUtilities.getBroadcastAddressAndPort()) .ranges());{code} One potential improvement here is to reuse `getAddressReplicas(FBUtilities.getBroadcastAddressAndPort())` so we don't rebuild the whole map We're using 16 vnodes. Instance has ~1K tables. Though there is still significant load comes from `calculateNaturalReplicas`. Wondering is there any reason here we always recalculate this map instead of using the cached `EndpointsForRange` similar to `getNaturalReplicas`? The issue might not be there in trunk after we have ClusterMetadata was: Running Cassandra 4.1.3. After switching to PaxosV2 for one of our instances, we started to see that, when a new node attempted to join the ring, multiple nodes within the cluster started to have spike in CPU utils. I collected the CPU profile on one of them and seeing: !image-2025-07-17-16-58-17-456.png|width=1335,height=715! So after switching to V2, new node boostrapping will trigger `repairPaxosForTopologyChange` which will schedule `PaxosCleanup` table by table. Seems in `isOutOfRange` we're doing some unnecessary calculation to compute the token map for the whole cluster with `getAddressReplicas()` - then only use the local range. {code:java} localRanges = Range.normalize(keyspace.getReplicationStrategy() .getAddressReplicas() .get(FBUtilities.getBroadcastAddressAndPort()) .ranges());{code} One potential improvement here is to reuse `getAddressReplicas(FBUtilities.getBroadcastAddressAndPort())` so we don't rebuild the whole map We're using 16 vnodes. Instance has ~1K tables. Though there is still significant load comes from `calculateNaturalReplicas`. Wondering is there any reason here we always recalculate this map instead of using the cached `EndpointsForRange` similar to `getNaturalReplicas`? The issue might not be there in trunk after we have ClusterMetadata > PaxosCleanup.isOutOfRange caused CPU util spikes in cluster on new node > joining > ------------------------------------------------------------------------------- > > Key: CASSANDRA-20774 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20774 > Project: Apache Cassandra > Issue Type: Bug > Reporter: Yuqi Yan > Priority: Normal > Attachments: image-2025-07-17-16-58-17-456.png > > > Running Cassandra 4.1.3. > After switching to PaxosV2 for one of our instances, we started to see that, > when a new node attempted to join the ring, multiple nodes within the cluster > started to have spike in CPU utils. > I collected the CPU profile on one of them and seeing: > !image-2025-07-17-16-58-17-456.png|width=1335,height=715! > > So after switching to V2, new node boostrapping will trigger > `repairPaxosForTopologyChange` which will schedule `PaxosCleanup` table by > table. > Seems in `isOutOfRange` we're doing some unnecessary calculation to compute > the token map for the whole cluster with `getAddressReplicas()` - then only > use the local range. > > {code:java} > localRanges = Range.normalize(keyspace.getReplicationStrategy() > .getAddressReplicas() > ^^^ this build the map for the > entire cluster > > .get(FBUtilities.getBroadcastAddressAndPort()) > .ranges());{code} > One potential improvement here is to reuse > `getAddressReplicas(FBUtilities.getBroadcastAddressAndPort())` so we don't > rebuild the whole map > We're using 16 vnodes. Instance has ~1K tables. > Though there is still significant load comes from `calculateNaturalReplicas`. > Wondering is there any reason here we always recalculate this map instead of > using the cached `EndpointsForRange` similar to `getNaturalReplicas`? > > The issue might not be there in trunk after we have ClusterMetadata > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org