> The problem is that today there’s no way to reliably exclude the new DC from 
> serving reads, that I know of? If you can, then yes you would only need to 
> ensure repair were run prior to activating reads from this DC.

We think we have a way to do this using certain settings in the Java driver.

Agree on your other points!



> On 20 Aug 2021, at 10:02 pm, bened...@apache.org wrote:
> 
>> My initial testing suggestedit was not required (when the new DC is not 
>> serving reads).
> 
> The problem is that today there’s no way to reliably exclude the new DC from 
> serving reads, that I know of? If you can, then yes you would only need to 
> ensure repair were run prior to activating reads from this DC.
> 
>> Perhaps the CL mechanism could be pluggable
> 
> I think this is unlikely, particularly as we start to consider things like 
> consensus - at least any time soon. Quorums are quite intricately woven into 
> any implementation, and it would be quite hard to fully generalise them. In 
> practice we can probably accommodate any simple vote threshold quorums  
> (those where some electorate each have a vote, and each vote has an equal 
> weight that reaches consensus once a threshold is crossed) and support at 
> least one level of nesting (so that DCs may logically vote as a block based 
> on some quorum within a DC) in any topology without a plugin system, and I 
> suspect this will be more than enough for any system in the foreseeable 
> future.
> 
>> I wonder if it should be a ‘default CL’ which can additionally be overridden 
>> by queries?
> 
> There are some practicalities that probably prohibit us from eliminating user 
> provided CLs, but I would like to see them phased out as far as possible as 
> they are very hard to verify. To support this flexibility more generally I’d 
> prefer to see tables offer potentially multiple consensus schemes with 
> potentially different qualities (that can perhaps even be named by the user) 
> for these cases, such as (for instance) fast-and-inconsistent-reads. This 
> still permits their properties to be vetted by the database while offering 
> flexibility to the user, and for them to declare at the operator level what 
> meeting this concept requires. It also means the database can maintain these 
> properties through any topology change.
> 
> But we’ll probably have people using legacy CLs for another decade, so we’re 
> going to have to support people querying with those CLs, but we might want to 
> encourage people to disable them on their clusters and migrate to safer 
> setups.
> 
> From: Miles Garnsey <miles.garn...@datastax.com>
> Date: Friday, 20 August 2021 at 12:51
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
> Many thanks for this detailed response Benedict. I look forward to seeing the 
> forthcoming proposals in relation to schema change safety when LWTs are in 
> use.
> 
> We have been following almost the scale-by-one workaround you described - I 
> am grateful for the additional validation. The only divergence is that we 
> have not been advising a repair in between each node addition. My initial 
> testing suggestedit was not required (when the new DC is not serving reads). 
> But if you are aware of issues that arise at scale then I’d love to hear your 
> experience, as we are still in the planning phase for that project.
> 
> Regarding CLs (off topic)
> 
>> To respond to Mick: we could introduce an EACH_SERIAL which would permit 
>> this to be done in one go. This isn’t a super complicated piece of work, and 
>> I’d be happy to help review a contribution here. However, in my view we 
>> should be reconsidering how quorums are decided more comprehensively. This 
>> is very off-topic, but there are other more sensible quorums for 
>> multi-region setups (such as quorum-of-quorums), but also there’s a wide 
>> range of useful quorums we don’t support, particularly heterogenous ones 
>> supporting lower write failure tolerance than read failure tolerance (for 
>> instance). Today we support only the most extreme versions of this, and all 
>> of our quorums must be mixed manually by clients which is error prone. In my 
>> opinion we should be moving towards specifying quorums on a per-table basis 
>> for reads and writes, so that clients do not specify their consistency 
>> levels. This way the database can configure arbitrary quorums, and also 
>> guarantee that these quorums provide the desired consistency.
> 
> I agree with your points here. I’d add that the geographical location of DCs 
> can be relevant.
> Perhaps the CL mechanism could be pluggable (in the same way that authn/z 
> both are) so that we can experiment in this area at higher velocity? (I 
> appreciate this is an invasive change.)
> A colleague and I are considering whether we might be able to look at the 
> EACH_QUORUM idea in the shorter term. We will share more if we have the 
> bandwidth to undertake the work.
> I also agree that CLs defined for tables is a worthy enhancement, I wonder if 
> it should be a ‘default CL’ which can additionally be overridden by queries?
> 
> In any event I feel I’ve hijacked your thread enough, but thank you again for 
> the warm welcome and the interesting discussion!
> 
>> On 20 Aug 2021, at 7:04 pm, bened...@apache.org wrote:
>> 
>> Hello and welcome!
>> 
>> So this is a really complicated topic, unfortunately, but the simple answer 
>> is that as currently formulated this work won’t address this particular 
>> case. The slightly longer answer is that this problem will be a thing of the 
>> past soon either way - there’s work incoming to address every possible 
>> category of this kind of problem, but it might take a little longer.
>> 
>> The full answer is that membership of a keyspace in Cassandra is a mess, and 
>> is derived from the intersection of two things: schema and gossip. The 
>> electorate verification addresses _gossip_ inconsistencies, that is, 
>> inconsistencies about what nodes are perceived to be a member of the ring. 
>> Schema generates the issue you are discussing here. In particular the lack 
>> of any state machine that transitions from one topology to another when a 
>> new schema implies a new topology. This is its own distinct problem, that 
>> others I work with plan to file a CEP for in the coming weeks or months.
>> 
>> In the meantime, the correct way to do this (painful though it might be) is 
>> to add one node at a time. So instead of adding DC2 at RF=3, add DC2 at RF=1 
>> and wait for that to settle, *run repair* and then bump to RF=2, etc.
>> 
>> To respond to Mick: we could introduce an EACH_SERIAL which would permit 
>> this to be done in one go. This isn’t a super complicated piece of work, and 
>> I’d be happy to help review a contribution here. However, in my view we 
>> should be reconsidering how quorums are decided more comprehensively. This 
>> is very off-topic, but there are other more sensible quorums for 
>> multi-region setups (such as quorum-of-quorums), but also there’s a wide 
>> range of useful quorums we don’t support, particularly heterogenous ones 
>> supporting lower write failure tolerance than read failure tolerance (for 
>> instance). Today we support only the most extreme versions of this, and all 
>> of our quorums must be mixed manually by clients which is error prone. In my 
>> opinion we should be moving towards specifying quorums on a per-table basis 
>> for reads and writes, so that clients do not specify their consistency 
>> levels. This way the database can configure arbitrary quorums, and also 
>> guarantee that these quorums provide the desired consistency.
>> 
>> 
>> From: Miles Garnsey <miles.garn...@datastax.com>
>> Date: Friday, 20 August 2021 at 00:47
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP 14: Paxos Improvements
>> Long time listener, first time caller here - hello!
>> 
>> I am very interested in this part "Better safety among range movements: 
>> Electorate verification during range movements provides a stronger assertion 
>> of linearizability via assurance of the set of instances voting on a 
>> transaction.”
>> 
>> I have seen issues in the wild where people want to add/remove DCs. I think 
>> that there may be a risk consistency violations due to transactions 
>> circumventing the locks held by in-progress transactions. Will electorate 
>> verification help in the below scenario?
>> Queries are running at SERIAL, writing at EACH_QUORUM against DC1 at RF=3.
>> DC2 is added, and once all nodes are in UN the schema is adjusted so that 
>> DC2’s RF=3.
>> While the new schema propagates, there is a transitional state, in which 
>> some potential coordinators have the new schema S2, and others are operating 
>> on the old schema S1.
>> In this state, S2 form consensus from 4/6 nodes, while S1 coordinators form 
>> consensus from 2/3 nodes.
>> A query issued from an S1 coordinator can form a valid consensus which will 
>> circumvent the lock held by an S2 coordinator.
>> I was thinking of proposing an EACH_QUORUM serial CL, but if electorate 
>> verification solves the problem then that may be the better solution.
>> 
>> Miles
>> 
>> 
>>> On 19 Aug 2021, at 9:18 am, Scott Andreas <sc...@paradoxica.net> wrote:
>>> 
>>> Benedict, thank you for sharing this CEP!
>>> 
>>> Adding some notes on why I support this proposal:
>>> 
>>> - Reducing common-case round trips from 4x to 2x on writes and 2x to 1x on 
>>> reads is a huge improvement. This latency reduction may be sufficient to 
>>> allow many users of Cassandra who operate in a single datacenter, 
>>> availability zone, or region to migrate to a multi-region topology.
>>> 
>>> - The Cluster Simulation work described in CEP-10 provides a toolchain for 
>>> probabilistically-exhaustive validation and simulation of transactional 
>>> correctness, allowing assertion of linearizability in the presence of 
>>> adversarial thread scheduling and message ordering over an unbounded number 
>>> of simulated clusters and transactions.
>>> 
>>> - Some use cases may see a superlinear increase in LWT performance due to a 
>>> reduction in contention afforded by fewer message round-trips. E.g., 
>>> halving latency shortens the interval during which competing transactions 
>>> may conflict, reducing contention and improving throughput beyond a level 
>>> that would be afforded by the latency reduction alone.
>>> 
>>> - Better safety among range movements: Electorate verification during range 
>>> movements provides a stronger assertion of linearizability via assurance of 
>>> the set of instances voting on a transaction.
>>> 
>>> – Scott
>>> 
>>> ________________________________________
>>> From: bened...@apache.org <bened...@apache.org>
>>> Sent: Wednesday, August 18, 2021 2:31 PM
>>> To: dev@cassandra.apache.org
>>> Subject: [DISCUSS] CEP 14: Paxos Improvements
>>> 
>>> RE: 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-14%3A+Paxos+Improvements
>>> 
>>> I’m proposing this CEP for approval by the project. The goal is to both 
>>> improve the performance of LWTs and to ensure their correctness across a 
>>> range of scenario like range movements. This work builds upon the Simulator 
>>> CEP that has been recently adopted, and patches will follow in the coming 
>>> weeks.
>>> 
>>> If you have any concerns or questions please raise them here for discussion.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 

Reply via email to