To be specific, I was advocating a Layer 7/DNS fail out of the affected DC. Something along this line: https://cloud.google.com/blog/products/networking/introducing-automated-failover-for-private-workloads-using-cloud-dns-routing-policies-with-health-checks
> On Dec 2, 2025, at 9:33 AM, Jon Haddad <[email protected]> wrote: > > > This proposal targets the use case in which only a few nodes in the region > > fail, but not all. > > If you're able to handle inconsistent results, why not use LOCAL_ONE? If you > have a preference for consistency but can tolerate some inconsistency > sometimes, DowngradingConsistencyRetryPolicy [1] might work for you. It's a > bit contentious in the community (some folks absolutely hate its existence), > but I've had a few use cases where it makes sense. > > Or is this scenario when 3 nodes fail and they happen to contain the same > token range? I'm curious what the math looks like. Are you using 256 vnodes > by any chance? > > If you execute a query and all 3 replicas in your local DC are down, > currently the driver throws an exception. I don't see how putting this logic > on the server helps, since the driver would need to be aware of REMOTE_QUORUM > to route to the remote replicas anyway. At that point you're just adding an > extra hop between the app and the database. > > If you have a database outage so severe that all 3 replicas are unavailable, > what's the state of the rest of the cluster? I'm genuinely curious how often > you'd have a case where 3 nodes with intersecting ranges are offline and it's > not a complete DC failure. > > I agree with Patrick and Stefan here. If you want to stay in the DC, doing it > at the driver level gives you complete control. If your infra is failing that > spectacularly, evacuate. > > I also wonder if witness replicas would help you, if you're seeing that much > failure. > > [1] > https://docs.datastax.com/en/drivers/java/4.17/com/datastax/oss/driver/api/core/retry/DowngradingConsistencyRetryPolicy.html > > > On Mon, Dec 1, 2025 at 7:37 PM Patrick McFadin <[email protected] > <mailto:[email protected]>> wrote: >> In my experience it usually makes more sense to evacuate out of a degraded >> DC at the application layer rather than having Cassandra silently fail over >> to a remote quorum. When a DC is in a bad state, it’s rarely “just a couple >> of Cassandra replicas are down.” The same underlying problem (network >> partitions, bad routing, DNS issues, load balancer problems, AZ failures, >> power incidents, I could go on...) >> >> What I can see is that REMOTE_QUORUM will behave like a dynamic >> reconfiguration of the replica group for a given request. That’s >> fundamentally different from today’s NTS contract, where the replica set per >> DC is static. That would open up all sorts of new ops headaches. >> - What happens replicas in the degraded DC? Would hints get stored? >> - I'm not sure if this would work with PAXOS since dynamic reconfiguration >> isn't supported. Two leaders possibly? >> - Which now brings up Accord and TCM considerations... >> >> So my strong bias is, once a DC is degraded enough that Cassandra can’t >> achieve LOCAL_QUORUM, that DC is probably not a place you want to keep >> serving user traffic from. At that point, I’d rather have the application >> tier evacuate and talk to a healthy DC, instead of the database internally >> changing its consistency behavior under the covers. It's just a ton of work >> for a problem with a simpiler solution. >> >> Patrick >> >> >> On Mon, Dec 1, 2025 at 7:07 PM Jaydeep Chovatia <[email protected] >> <mailto:[email protected]>> wrote: >>> >For the record, there is whole section about this here (sorry for pasting >>> >the same link twice in my original mail) >>> >>> Thanks, Stefan, for sharing the details about the client-side failover >>> mechanism. This technique is definitely useful when the entire region is >>> down. >>> >>> This proposal targets the use case in which only a few nodes in the region >>> fail, but not all. For use cases that prioritize availability over latency, >>> they can avail this option, and the server will automatically fulfill those >>> requests from the remote region. >>> >>> Jaydeep >>> >>> On Mon, Dec 1, 2025 at 2:58 PM Štefan Miklošovič <[email protected] >>> <mailto:[email protected]>> wrote: >>>> For the record, there is whole section about this here (sorry for pasting >>>> the same link twice in my original mail) >>>> >>>> https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/load_balancing/index.html#cross-datacenter-failover >>>> >>>> Interesting to see your perspective, I can see how doing something without >>>> application changes might seem appealing. >>>> >>>> I wonder what others think about this. >>>> >>>> Regards >>>> >>>> On Mon, Dec 1, 2025 at 11:39 PM Qc L <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>> Hi Stefan, >>>>> >>>>> Thanks for raising the question about using a custom LoadBalancingPolicy! >>>>> Remote quorum and LB-based failover aim to solve similar problems, >>>>> maintaining availability during datacenter degradation, so I did a >>>>> comparison study on the server-side remote quorum solution versus relying >>>>> on driver-side logic. >>>>> >>>>> >>>>> For our use cases, we found that client-side failover is expensive to >>>>> operate and leads to fragmented behavior during incidents. We also prefer >>>>> failover to be triggered only when needed, not automatically. A >>>>> server-side mechanism gives us controlled, predictable behavior and makes >>>>> the system easier to operate. >>>>> Remote quorum is implemented on the server, where the coordinator can >>>>> intentionally form a quorum using replicas in a backup region when the >>>>> local region cannot satisfy LOCAL_QUORUM or EACH_QUORUM. The database >>>>> determines the fallback region, how replicas are selected, and ensures >>>>> consistency guarantees still hold. >>>>> >>>>> By keeping this logic internal to the server, we provide a unified, >>>>> consistent behavior across all clients without requiring any application >>>>> changes. >>>>> >>>>> Happy to discuss further if helpful. >>>>> >>>>> >>>>> On Mon, Dec 1, 2025 at 12:57 PM Štefan Miklošovič <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>>> Hello, >>>>>> >>>>>> before going deeper into your proposal ... I just have to ask, have you >>>>>> tried e.g. custom LoadBalancingPolicy in driver, if you use that? >>>>>> >>>>>> I can imagine that if the driver detects a node to be down then based on >>>>>> its "distance" / dc it belongs to you might start to create a different >>>>>> query plan which talks to remote DC and similar. >>>>>> >>>>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >>>>>> >>>>>> https://docs.datastax.com/en/drivers/java/4.2/com/datastax/oss/driver/api/core/loadbalancing/LoadBalancingPolicy.html >>>>>> >>>>>> On Mon, Dec 1, 2025 at 8:54 PM Qc L <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>>> Hello team, >>>>>>> >>>>>>> I’d like to propose adding the remote quorum to Cassandra consistency >>>>>>> level for handling simultaneous hosts unavailability in the local data >>>>>>> center that cannot achieve the required quorum. >>>>>>> >>>>>>> Background >>>>>>> NetworkTopologyStrategy is the most commonly used strategy at Uber, and >>>>>>> we use Local_Quorum for read/write in many use cases. Our Cassandra >>>>>>> deployment in each data center currently relies on majority replicas >>>>>>> being healthy to consistently achieve local quorum. >>>>>>> >>>>>>> Current behavior >>>>>>> When a local data center in a Cassandra deployment experiences outages, >>>>>>> network isolation, or maintenance events, the EACH_QUORUM / >>>>>>> LOCAL_QUORUM consistency level will fail for both reads and writes if >>>>>>> enough replicas in that the wlocal data center are unavailable. In this >>>>>>> configuration, simultaneous hosts unavailability can temporarily >>>>>>> prevent the cluster from reaching the required quorum for reads and >>>>>>> writes. For applications that require high availability and a seamless >>>>>>> user experience, this can lead to service downtime and a noticeable >>>>>>> drop in overall availability. see attached figure1.<figure1.png> >>>>>>> >>>>>>> Proposed Solution >>>>>>> To prevent this issue and ensure a seamless user experience, we can use >>>>>>> the Remote Quorum consistency level as a fallback mechanism in >>>>>>> scenarios where local replicas are unavailable. Remote Quorum in >>>>>>> Cassandra refers to a read or write operation that achieves quorum (a >>>>>>> majority of replicas) in the remote data center, rather than relying >>>>>>> solely on replicas within the local data center. see attached Figure2. >>>>>>> >>>>>>> <figure2.png> >>>>>>> >>>>>>> We will add a feature to do read/write consistency level override on >>>>>>> the server side. When local replicas are not available, we will >>>>>>> overwrite the server side write consistency level from each quorum to >>>>>>> remote quorum. Note that, implementing this change in client side will >>>>>>> require some protocol changes in CQL, we only add this on server side >>>>>>> which can only be used by server internal. >>>>>>> >>>>>>> For example, giving the following Cassandra setup >>>>>>> CREATE KEYSPACE ks WITH REPLICATION = { >>>>>>> 'class': 'NetworkTopologyStrategy', >>>>>>> 'cluster1': 3, >>>>>>> 'cluster2': 3, >>>>>>> 'cluster3': 3 >>>>>>> }; >>>>>>> >>>>>>> The selected approach for this design is to explicitly configure a >>>>>>> backup data center mapping for the local data center, where each data >>>>>>> center defines its preferred failover target. For example >>>>>>> remote_quorum_target_data_center: >>>>>>> cluster1: cluster2 >>>>>>> cluster2: cluster3 >>>>>>> cluster3: cluster1 >>>>>>> >>>>>>> Implementations >>>>>>> We proposed the following feature to Cassandra to address data center >>>>>>> failure scenarios >>>>>>> Introduce a new Consistency level called remote quorum. >>>>>>> Feature to do read/write consistency level override on server side. >>>>>>> (This can be controlled by a feature flag). Use Node tools command to >>>>>>> turn on/off the server failback >>>>>>> Why remote quorum is useful >>>>>>> As shown in the figure 2, we have the data center failure in one of the >>>>>>> data centers, Local quorum and each quorum will fail since two replicas >>>>>>> are unavailable and cannot meet the quorum requirement in the local >>>>>>> data center. >>>>>>> >>>>>>> During the incident, we can use the nodetool to enable failover to >>>>>>> remote quorum. With the failover enabled, we can failover to the remote >>>>>>> data center for read and write, which avoids the available drop. >>>>>>> >>>>>>> Any thoughts or concerns? >>>>>>> Thanks in advance for your feedback, >>>>>>> >>>>>>> Qiaochu >>>>>>>
