Qiaochu Liu created CASSANDRA-21049:
---------------------------------------

             Summary: Cassandra Cross Region Read/Write with Remote Quorum 
During Regional Failure
                 Key: CASSANDRA-21049
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21049
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Consistency/Coordination
            Reporter: Qiaochu Liu
         Attachments: detailed.png

h1. *Background*

NetworkTopologyStrategy is the most commonly used strategy at Uber, and we use 
Local_Quorum for read/write in many use cases. Our Cassandra deployment in each 
region currently relies on majority replicas being healthy to consistently 
achieve local quorum. 
h1. *Current behavior*

When a local region in a Cassandra deployment experiences outages, network 
isolation, or maintenance events, the EACH_QUORUM / LOCAL_QUORUM consistency 
level will fail for both reads and writes if enough replicas in that local 
region are unavailable. In this configuration, simultaneous hosts 
unavailability can temporarily prevent the cluster from reaching the required 
quorum for reads and writes. For applications that require high availability 
and a seamless user experience, this can lead to service downtime and a 
noticeable drop in overall availability.
h1. *Proposed Solution*

To prevent this issue and ensure a seamless user experience, we can use the 
*Remote Quorum* consistency level as a fallback mechanism in scenarios where 
local replicas are unavailable. Remote Quorum in Cassandra refers to a read or 
write operation that achieves quorum (a majority of replicas) across remote 
regions, rather than relying solely on replicas within the local region. 

The selected approach for this design is to explicitly configure a backup 
region mapping for the local region, where each region defines its preferred 
failover target. For example
|backup_regional_cluster:
  cluster1: cluster2
  cluster2: cluster3
  cluster3: cluster3|

We will add a feature to do read/write consistency level override on the server 
side. When local replicas are not available, we will overwrite the server side 
write consistency level from local quorum to remote quorum. *Note that,* 
implementing this change in client side will require some protocol changes in 
CQL, we only add this on server side which can only be used by server internal.
h1. *Implementations*

We proposed the following feature to Cassandra to address regional failure 
scenarios 
 * Introduce a new Consistency level called remote quorum
 * Feature to do read/write consistency level override on server side. (This 
can be controlled by a feature flag). Use Node tools command to turn on/off the 
server failback 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to