> The problem is that today there’s no way to reliably exclude the new DC from > serving reads, that I know of? If you can, then yes you would only need to > ensure repair were run prior to activating reads from this DC.
We think we have a way to do this using certain settings in the Java driver. Agree on your other points! > On 20 Aug 2021, at 10:02 pm, bened...@apache.org wrote: > >> My initial testing suggestedit was not required (when the new DC is not >> serving reads). > > The problem is that today there’s no way to reliably exclude the new DC from > serving reads, that I know of? If you can, then yes you would only need to > ensure repair were run prior to activating reads from this DC. > >> Perhaps the CL mechanism could be pluggable > > I think this is unlikely, particularly as we start to consider things like > consensus - at least any time soon. Quorums are quite intricately woven into > any implementation, and it would be quite hard to fully generalise them. In > practice we can probably accommodate any simple vote threshold quorums > (those where some electorate each have a vote, and each vote has an equal > weight that reaches consensus once a threshold is crossed) and support at > least one level of nesting (so that DCs may logically vote as a block based > on some quorum within a DC) in any topology without a plugin system, and I > suspect this will be more than enough for any system in the foreseeable > future. > >> I wonder if it should be a ‘default CL’ which can additionally be overridden >> by queries? > > There are some practicalities that probably prohibit us from eliminating user > provided CLs, but I would like to see them phased out as far as possible as > they are very hard to verify. To support this flexibility more generally I’d > prefer to see tables offer potentially multiple consensus schemes with > potentially different qualities (that can perhaps even be named by the user) > for these cases, such as (for instance) fast-and-inconsistent-reads. This > still permits their properties to be vetted by the database while offering > flexibility to the user, and for them to declare at the operator level what > meeting this concept requires. It also means the database can maintain these > properties through any topology change. > > But we’ll probably have people using legacy CLs for another decade, so we’re > going to have to support people querying with those CLs, but we might want to > encourage people to disable them on their clusters and migrate to safer > setups. > > From: Miles Garnsey <miles.garn...@datastax.com> > Date: Friday, 20 August 2021 at 12:51 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP 14: Paxos Improvements > Many thanks for this detailed response Benedict. I look forward to seeing the > forthcoming proposals in relation to schema change safety when LWTs are in > use. > > We have been following almost the scale-by-one workaround you described - I > am grateful for the additional validation. The only divergence is that we > have not been advising a repair in between each node addition. My initial > testing suggestedit was not required (when the new DC is not serving reads). > But if you are aware of issues that arise at scale then I’d love to hear your > experience, as we are still in the planning phase for that project. > > Regarding CLs (off topic) > >> To respond to Mick: we could introduce an EACH_SERIAL which would permit >> this to be done in one go. This isn’t a super complicated piece of work, and >> I’d be happy to help review a contribution here. However, in my view we >> should be reconsidering how quorums are decided more comprehensively. This >> is very off-topic, but there are other more sensible quorums for >> multi-region setups (such as quorum-of-quorums), but also there’s a wide >> range of useful quorums we don’t support, particularly heterogenous ones >> supporting lower write failure tolerance than read failure tolerance (for >> instance). Today we support only the most extreme versions of this, and all >> of our quorums must be mixed manually by clients which is error prone. In my >> opinion we should be moving towards specifying quorums on a per-table basis >> for reads and writes, so that clients do not specify their consistency >> levels. This way the database can configure arbitrary quorums, and also >> guarantee that these quorums provide the desired consistency. > > I agree with your points here. I’d add that the geographical location of DCs > can be relevant. > Perhaps the CL mechanism could be pluggable (in the same way that authn/z > both are) so that we can experiment in this area at higher velocity? (I > appreciate this is an invasive change.) > A colleague and I are considering whether we might be able to look at the > EACH_QUORUM idea in the shorter term. We will share more if we have the > bandwidth to undertake the work. > I also agree that CLs defined for tables is a worthy enhancement, I wonder if > it should be a ‘default CL’ which can additionally be overridden by queries? > > In any event I feel I’ve hijacked your thread enough, but thank you again for > the warm welcome and the interesting discussion! > >> On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote: >> >> Hello and welcome! >> >> So this is a really complicated topic, unfortunately, but the simple answer >> is that as currently formulated this work won’t address this particular >> case. The slightly longer answer is that this problem will be a thing of the >> past soon either way - there’s work incoming to address every possible >> category of this kind of problem, but it might take a little longer. >> >> The full answer is that membership of a keyspace in Cassandra is a mess, and >> is derived from the intersection of two things: schema and gossip. The >> electorate verification addresses _gossip_ inconsistencies, that is, >> inconsistencies about what nodes are perceived to be a member of the ring. >> Schema generates the issue you are discussing here. In particular the lack >> of any state machine that transitions from one topology to another when a >> new schema implies a new topology. This is its own distinct problem, that >> others I work with plan to file a CEP for in the coming weeks or months. >> >> In the meantime, the correct way to do this (painful though it might be) is >> to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 >> and wait for that to settle, *run repair* and then bump to RF=2, etc. >> >> To respond to Mick: we could introduce an EACH_SERIAL which would permit >> this to be done in one go. This isn’t a super complicated piece of work, and >> I’d be happy to help review a contribution here. However, in my view we >> should be reconsidering how quorums are decided more comprehensively. This >> is very off-topic, but there are other more sensible quorums for >> multi-region setups (such as quorum-of-quorums), but also there’s a wide >> range of useful quorums we don’t support, particularly heterogenous ones >> supporting lower write failure tolerance than read failure tolerance (for >> instance). Today we support only the most extreme versions of this, and all >> of our quorums must be mixed manually by clients which is error prone. In my >> opinion we should be moving towards specifying quorums on a per-table basis >> for reads and writes, so that clients do not specify their consistency >> levels. This way the database can configure arbitrary quorums, and also >> guarantee that these quorums provide the desired consistency. >> >> >> From: Miles Garnsey <miles.garn...@datastax.com> >> Date: Friday, 20 August 2021 at 00:47 >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements >> Long time listener, first time caller here - hello! >> >> I am very interested in this part "Better safety among range movements: >> Electorate verification during range movements provides a stronger assertion >> of linearizability via assurance of the set of instances voting on a >> transaction.” >> >> I have seen issues in the wild where people want to add/remove DCs. I think >> that there may be a risk consistency violations due to transactions >> circumventing the locks held by in-progress transactions. Will electorate >> verification help in the below scenario? >> Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3. >> DC2 is added, and once all nodes are in UN the schema is adjusted so that >> DC2’s RF=3. >> While the new schema propagates, there is a transitional state, in which >> some potential coordinators have the new schema S2, and others are operating >> on the old schema S1. >> In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form >> consensus from 2/3 nodes. >> A query issued from an S1 coordinator can form a valid consensus which will >> circumvent the lock held by an S2 coordinator. >> I was thinking of proposing an EACH_QUORUM serial CL, but if electorate >> verification solves the problem then that may be the better solution. >> >> Miles >> >> >>> On 19 Aug 2021, at 9:18 am, Scott Andreas <sc...@paradoxica.net> wrote: >>> >>> Benedict, thank you for sharing this CEP! >>> >>> Adding some notes on why I support this proposal: >>> >>> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on >>> reads is a huge improvement. This latency reduction may be sufficient to >>> allow many users of Cassandra who operate in a single datacenter, >>> availability zone, or region to migrate to a multi-region topology. >>> >>> - The Cluster Simulation work described in CEP-10 provides a toolchain for >>> probabilistically-exhaustive validation and simulation of transactional >>> correctness, allowing assertion of linearizability in the presence of >>> adversarial thread scheduling and message ordering over an unbounded number >>> of simulated clusters and transactions. >>> >>> - Some use cases may see a superlinear increase in LWT performance due to a >>> reduction in contention afforded by fewer message round-trips. E.g., >>> halving latency shortens the interval during which competing transactions >>> may conflict, reducing contention and improving throughput beyond a level >>> that would be afforded by the latency reduction alone. >>> >>> - Better safety among range movements: Electorate verification during range >>> movements provides a stronger assertion of linearizability via assurance of >>> the set of instances voting on a transaction. >>> >>> – Scott >>> >>> ________________________________________ >>> From: bened...@apache.org <bened...@apache.org> >>> Sent: Wednesday, August 18, 2021 2:31 PM >>> To: dev@cassandra.apache.org >>> Subject: [DISCUSS] CEP 14: Paxos Improvements >>> >>> RE: >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements >>> >>> I’m proposing this CEP for approval by the project. The goal is to both >>> improve the performance of LWTs and to ensure their correctness across a >>> range of scenario like range movements. This work builds upon the Simulator >>> CEP that has been recently adopted, and patches will follow in the coming >>> weeks. >>> >>> If you have any concerns or questions please raise them here for discussion. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>