[ https://issues.apache.org/jira/browse/CASSANDRA-20774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuqi Yan updated CASSANDRA-20774: --------------------------------- Description: Running Cassandra 4.1.3. After switching to PaxosV2 for one of our instances, we started to see that, when a new node attempted to join the ring, multiple nodes within the cluster started to have spike in CPU utils. I collected the CPU profile on one of them and seeing: !image-2025-07-17-16-58-17-456.png|width=1335,height=715! So after switching to V2, new node boostrapping will trigger `repairPaxosForTopologyChange` which will schedule `PaxosCleanup` table by table. Seems in `isOutOfRange` we're doing some unnecessary calculation to compute the token map for the whole cluster with `getAddressReplicas()` - then only use the local range. {code:java} localRanges = Range.normalize(keyspace.getReplicationStrategy() .getAddressReplicas() .get(FBUtilities.getBroadcastAddressAndPort()) .ranges());{code} One potential improvement here is to reuse `getAddressReplicas(FBUtilities.getBroadcastAddressAndPort())` so we don't rebuild the whole map We're using 16 vnodes. Instance has ~1K tables. Though there is still significant load comes from `calculateNaturalReplicas`. Wondering is there any reason here we always recalculate this map instead of using the cached `EndpointsForRange` similar to `getNaturalReplicas`? The issue might not be there in trunk after we have ClusterMetadata was: Running Cassandra 4.1.3. After switching to PaxosV2 for one of our instances, we started to see that, when a new node attempted to join the ring, multiple nodes within the cluster started to have spike in CPU utils. I collected the CPU profile on one of them and seeing: !image-2025-07-17-16-58-17-456.png|width=1335,height=715! So after switching to V2, new node boostrapping will trigger repairPaxosForTopologyChange which will schedule PaxosCleanup table by table. Seems in `isOutOfRange` we're doing some unnecessary calculation to compute the token map for the whole cluster with getAddressReplicas() - then only use the local range. {code:java} localRanges = Range.normalize(keyspace.getReplicationStrategy() .getAddressReplicas() .get(FBUtilities.getBroadcastAddressAndPort()) .ranges());{code} One potential improvement here is to reuse getAddressReplicas(FBUtilities.getBroadcastAddressAndPort()) so we don't rebuild the whole map We're using 16 vnodes. Instance has ~1K tables. Though there is still significant load comes from calculateNaturalReplicas. Wondering is there any reason here we always recalculate this map instead of using the cached EndpointsForRange similar to getNaturalReplicas? The issue might not be there in trunk after we have ClusterMetadata > PaxosCleanup.isOutOfRange caused CPU util spikes in cluster on new node > joining > ------------------------------------------------------------------------------- > > Key: CASSANDRA-20774 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20774 > Project: Apache Cassandra > Issue Type: Bug > Reporter: Yuqi Yan > Priority: Normal > Attachments: image-2025-07-17-16-58-17-456.png > > > Running Cassandra 4.1.3. > After switching to PaxosV2 for one of our instances, we started to see that, > when a new node attempted to join the ring, multiple nodes within the cluster > started to have spike in CPU utils. > I collected the CPU profile on one of them and seeing: > !image-2025-07-17-16-58-17-456.png|width=1335,height=715! > > So after switching to V2, new node boostrapping will trigger > `repairPaxosForTopologyChange` which will schedule `PaxosCleanup` table by > table. > Seems in `isOutOfRange` we're doing some unnecessary calculation to compute > the token map for the whole cluster with `getAddressReplicas()` - then only > use the local range. > > {code:java} > localRanges = Range.normalize(keyspace.getReplicationStrategy() > .getAddressReplicas() > > .get(FBUtilities.getBroadcastAddressAndPort()) > .ranges());{code} > One potential improvement here is to reuse > `getAddressReplicas(FBUtilities.getBroadcastAddressAndPort())` so we don't > rebuild the whole map > We're using 16 vnodes. Instance has ~1K tables. > Though there is still significant load comes from `calculateNaturalReplicas`. > Wondering is there any reason here we always recalculate this map instead of > using the cached `EndpointsForRange` similar to `getNaturalReplicas`? > > The issue might not be there in trunk after we have ClusterMetadata > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org