[ 
https://issues.apache.org/jira/browse/CASSGO-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raj Ummadisetty updated CASSGO-104:
-----------------------------------
    Description: 
h3. Problem:



The TokenAwareHostPolicy has two critical bugs affecting multi-keyspace 
workloads:

{*}Issue 1{*}: Missing Replica Maps for Non-Default Keyspaces

When a session is created with keyspace (e.g., {color:#0747a6}ks1{color}) but 
queries are executed against a different keyspace (e.g., {color:#00875a}SELECT 
* FROM ks2.table{color}), the TokenAwareHostPolicy fails to perform token-aware 
routing for the non-default keyspace. This occurs because the replica map 
(meta.replicas) is only populated for the session's default keyspace.

This is a follow-up to issue ([GitHub 
#1621|https://github.com/apache/cassandra-gocql-driver/issues/1621]). While [PR 
#1714|https://github.com/apache/cassandra-gocql-driver/pull/1714] fixed the 
keyspace extraction from prepared statements, it did not address the underlying 
issue of populating replica information for non-default keyspaces.

*Issue 2:* Stale Replica Maps After Topology Changes 

When a non-default keyspace is added via schema change events 
(KeyspaceChanged), its replica map is populated. However, when topology changes 
occur 
([AddHost/RemoveHost|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L490-L547]),
 only the session keyspace replica map is updated. Non-default keyspaces retain 
STALE replica maps with outdated topology information.
h3. 
Impact:
 * Ineffective replica shuffling: Even when ShuffleReplicas() is enabled, all 
queries to non-default keyspaces go to the primary replica
 * Creates uneven load distribution across the cluster
 * Queries to non-default keyspaces route to wrong or removed nodes
 * Can cause query failures (NoHostAvailableException)

h3. Steps to Reproduce:

 
{code:java}
// Create session with keyspace ks1
cluster := gocql.NewCluster("127.0.0.1")
cluster.Keyspace = "ks1"
cluster.PoolConfig.HostSelectionPolicy = 
gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy(),gocql.ShuffleReplicas))
session, _ := cluster.CreateSession()
defer session.Close()
// Query keyspace ks2 
stmt := "SELECT * FROM ks2.table WHERE id = ?"
// This query will always be routed to primary replica
query := session.Query(stmt, someID)


{code}
 
h3. Root Cause:

In policies.go, the Pick() method looks up replicas:
[ht := 
meta.replicas[keyspace].replicasFor(token)|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L615]

However, meta.replicas[keyspace] is nil for any keyspace except the session's 
default keyspace
h3. Proposed Solution:
 * When TokenAwareHostPolicy is initialized hydrate the replica map for all 
keyspaces. 
 * replica map needs to be updated for all the keyspaces for topology changes. 

h3. Alternatives:

Implement lazy loading of replica maps in the Pick() method:

1. When replicas are not found for a keyspace, call a new 
ensureReplicasForKeyspace() method
2. This method uses double-checked locking to populate the replica map on-demand
3. Subsequent queries to the same keyspace use the cached replica information

  was:
h3. Problem:

When a session is created with keyspace (e.g., {color:#0747a6}ks1{color}) but 
queries are executed against a different keyspace (e.g., {color:#00875a}SELECT 
* FROM ks2.table{color}), the TokenAwareHostPolicy fails to perform token-aware 
routing for the non-default keyspace. This occurs because the replica map 
(meta.replicas) is only populated for the session's default keyspace.

This is a follow-up to issue ([GitHub 
#1621|https://github.com/apache/cassandra-gocql-driver/issues/1621]). While [PR 
#1714|https://github.com/apache/cassandra-gocql-driver/pull/1714] fixed the 
keyspace extraction from prepared statements, it did not address the underlying 
issue of populating replica information for non-default keyspaces.
h3. Impact:
 * Ineffective replica shuffling: Even when ShuffleReplicas() is enabled, all 
queries to non-default keyspaces go to the primary replica
 * Creates uneven load distribution across the cluster

h3. Steps to Reproduce:

 
{code:java}
// Create session with keyspace ks1
cluster := gocql.NewCluster("127.0.0.1")
cluster.Keyspace = "ks1"
cluster.PoolConfig.HostSelectionPolicy = 
gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy(),gocql.ShuffleReplicas))
session, _ := cluster.CreateSession()
defer session.Close()
// Query keyspace ks2 
stmt := "SELECT * FROM ks2.table WHERE id = ?"
// This query will always be routed to primary replica
query := session.Query(stmt, someID)


{code}
 
h3. Root Cause:

In policies.go, the Pick() method looks up replicas:
[ht := 
meta.replicas[keyspace].replicasFor(token)|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L615]

However, meta.replicas[keyspace] is nil for any keyspace except the session's 
default keyspace
h3. Proposed Solution

Implement lazy loading of replica maps in the Pick() method:

1. When replicas are not found for a keyspace, call a new 
ensureReplicasForKeyspace() method
2. This method uses double-checked locking to populate the replica map on-demand
3. Subsequent queries to the same keyspace use the cached replica information


> "TokenAwareHostPolicy should populate and maintain replica maps for all 
> keyspaces
> ---------------------------------------------------------------------------------
>
>                 Key: CASSGO-104
>                 URL: https://issues.apache.org/jira/browse/CASSGO-104
>             Project: Apache Cassandra Go driver
>          Issue Type: Bug
>            Reporter: Raj Ummadisetty
>            Assignee: Raj Ummadisetty
>            Priority: Normal
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Problem:
> The TokenAwareHostPolicy has two critical bugs affecting multi-keyspace 
> workloads:
> {*}Issue 1{*}: Missing Replica Maps for Non-Default Keyspaces
> When a session is created with keyspace (e.g., {color:#0747a6}ks1{color}) but 
> queries are executed against a different keyspace (e.g., 
> {color:#00875a}SELECT * FROM ks2.table{color}), the TokenAwareHostPolicy 
> fails to perform token-aware routing for the non-default keyspace. This 
> occurs because the replica map (meta.replicas) is only populated for the 
> session's default keyspace.
> This is a follow-up to issue ([GitHub 
> #1621|https://github.com/apache/cassandra-gocql-driver/issues/1621]). While 
> [PR #1714|https://github.com/apache/cassandra-gocql-driver/pull/1714] fixed 
> the keyspace extraction from prepared statements, it did not address the 
> underlying issue of populating replica information for non-default keyspaces.
> *Issue 2:* Stale Replica Maps After Topology Changes 
> When a non-default keyspace is added via schema change events 
> (KeyspaceChanged), its replica map is populated. However, when topology 
> changes occur 
> ([AddHost/RemoveHost|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L490-L547]),
>  only the session keyspace replica map is updated. Non-default keyspaces 
> retain STALE replica maps with outdated topology information.
> h3. 
> Impact:
>  * Ineffective replica shuffling: Even when ShuffleReplicas() is enabled, all 
> queries to non-default keyspaces go to the primary replica
>  * Creates uneven load distribution across the cluster
>  * Queries to non-default keyspaces route to wrong or removed nodes
>  * Can cause query failures (NoHostAvailableException)
> h3. Steps to Reproduce:
>  
> {code:java}
> // Create session with keyspace ks1
> cluster := gocql.NewCluster("127.0.0.1")
> cluster.Keyspace = "ks1"
> cluster.PoolConfig.HostSelectionPolicy = 
> gocql.TokenAwareHostPolicy(gocql.RoundRobinHostPolicy(),gocql.ShuffleReplicas))
> session, _ := cluster.CreateSession()
> defer session.Close()
> // Query keyspace ks2 
> stmt := "SELECT * FROM ks2.table WHERE id = ?"
> // This query will always be routed to primary replica
> query := session.Query(stmt, someID)
> {code}
>  
> h3. Root Cause:
> In policies.go, the Pick() method looks up replicas:
> [ht := 
> meta.replicas[keyspace].replicasFor(token)|https://github.com/apache/cassandra-gocql-driver/blob/f1e31a58f7e0c25e58e2e2a0a0c6de358e643e8b/policies.go#L615]
> However, meta.replicas[keyspace] is nil for any keyspace except the session's 
> default keyspace
> h3. Proposed Solution:
>  * When TokenAwareHostPolicy is initialized hydrate the replica map for all 
> keyspaces. 
>  * replica map needs to be updated for all the keyspaces for topology 
> changes. 
> h3. Alternatives:
> Implement lazy loading of replica maps in the Pick() method:
> 1. When replicas are not found for a keyspace, call a new 
> ensureReplicasForKeyspace() method
> 2. This method uses double-checked locking to populate the replica map 
> on-demand
> 3. Subsequent queries to the same keyspace use the cached replica information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to