[
https://issues.apache.org/jira/browse/CASSANDRA-21049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qiaochu Liu updated CASSANDRA-21049:
------------------------------------
Summary: Cassandra Cross Data Center Read/Write with Remote Quorum During
Data Center Failure (was: Cassandra Cross Region Read/Write with Remote Quorum
During Regional Failure)
> Cassandra Cross Data Center Read/Write with Remote Quorum During Data Center
> Failure
> ------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21049
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21049
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Consistency/Coordination
> Reporter: Qiaochu Liu
> Priority: Normal
> Attachments: detailed.png
>
>
> h1. *Background*
> NetworkTopologyStrategy is the most commonly used strategy at Uber, and we
> use Local_Quorum for read/write in many use cases. Our Cassandra deployment
> in each region currently relies on majority replicas being healthy to
> consistently achieve local quorum.
> h1. *Current behavior*
> When a local region in a Cassandra deployment experiences outages, network
> isolation, or maintenance events, the EACH_QUORUM / LOCAL_QUORUM consistency
> level will fail for both reads and writes if enough replicas in that local
> region are unavailable. In this configuration, simultaneous hosts
> unavailability can temporarily prevent the cluster from reaching the required
> quorum for reads and writes. For applications that require high availability
> and a seamless user experience, this can lead to service downtime and a
> noticeable drop in overall availability.
> h1. *Proposed Solution*
> To prevent this issue and ensure a seamless user experience, we can use the
> *Remote Quorum* consistency level as a fallback mechanism in scenarios where
> local replicas are unavailable. Remote Quorum in Cassandra refers to a read
> or write operation that achieves quorum (a majority of replicas) across
> remote regions, rather than relying solely on replicas within the local
> region.
> The selected approach for this design is to explicitly configure a backup
> region mapping for the local region, where each region defines its preferred
> failover target. For example
> |backup_regional_cluster:
> cluster1: cluster2
> cluster2: cluster3
> cluster3: cluster1|
> We will add a feature to do read/write consistency level override on the
> server side. When local replicas are not available, we will overwrite the
> server side write consistency level from each quorum to remote quorum. *Note
> that,* implementing this change in client side will require some protocol
> changes in CQL, we only add this on server side which can only be used by
> server internal.
> h1. *Implementations*
> We proposed the following feature to Cassandra to address regional failure
> scenarios
> * Introduce a new Consistency level called remote quorum
> * Feature to do read/write consistency level override on server side. (This
> can be controlled by a feature flag). Use Node tools command to turn on/off
> the server failback
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]