[ 
https://issues.apache.org/jira/browse/CASSANDRA-21049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiaochu Liu updated CASSANDRA-21049:
------------------------------------
    Summary: Cassandra Cross Data Center Read/Write with Remote Quorum During 
Data Center Failure  (was: Cassandra Cross Region Read/Write with Remote Quorum 
During Regional Failure)

> Cassandra Cross Data Center Read/Write with Remote Quorum During Data Center 
> Failure
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21049
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21049
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Consistency/Coordination
>            Reporter: Qiaochu Liu
>            Priority: Normal
>         Attachments: detailed.png
>
>
> h1. *Background*
> NetworkTopologyStrategy is the most commonly used strategy at Uber, and we 
> use Local_Quorum for read/write in many use cases. Our Cassandra deployment 
> in each region currently relies on majority replicas being healthy to 
> consistently achieve local quorum. 
> h1. *Current behavior*
> When a local region in a Cassandra deployment experiences outages, network 
> isolation, or maintenance events, the EACH_QUORUM / LOCAL_QUORUM consistency 
> level will fail for both reads and writes if enough replicas in that local 
> region are unavailable. In this configuration, simultaneous hosts 
> unavailability can temporarily prevent the cluster from reaching the required 
> quorum for reads and writes. For applications that require high availability 
> and a seamless user experience, this can lead to service downtime and a 
> noticeable drop in overall availability.
> h1. *Proposed Solution*
> To prevent this issue and ensure a seamless user experience, we can use the 
> *Remote Quorum* consistency level as a fallback mechanism in scenarios where 
> local replicas are unavailable. Remote Quorum in Cassandra refers to a read 
> or write operation that achieves quorum (a majority of replicas) across 
> remote regions, rather than relying solely on replicas within the local 
> region. 
> The selected approach for this design is to explicitly configure a backup 
> region mapping for the local region, where each region defines its preferred 
> failover target. For example
> |backup_regional_cluster:
>   cluster1: cluster2
>   cluster2: cluster3
>   cluster3: cluster1|
> We will add a feature to do read/write consistency level override on the 
> server side. When local replicas are not available, we will overwrite the 
> server side write consistency level from each quorum to remote quorum. *Note 
> that,* implementing this change in client side will require some protocol 
> changes in CQL, we only add this on server side which can only be used by 
> server internal.
> h1. *Implementations*
> We proposed the following feature to Cassandra to address regional failure 
> scenarios 
>  * Introduce a new Consistency level called remote quorum
>  * Feature to do read/write consistency level override on server side. (This 
> can be controlled by a feature flag). Use Node tools command to turn on/off 
> the server failback 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to