Re: [DISCUSS] CEP-15: General Purpose Transactions

[email protected] Mon, 20 Sep 2021 11:05:06 -0700

Hi Joey,

Thanks for the feedback and suggestions.


> I was wondering what do you think about having some extended Q&A after your 
> ApacheCon talk Wednesday

I would love to do this. I’ll have to figure out how though – my understanding 
is that I have a hard 40m for my talk and any Q&A, and I expect the talk to 
occupy most of those 40m as I try to cover both the CEP-14 and CEP-15. I’m not 
sure what facilities are made available by Hopin, but if necessary we can 
perhaps post some external video chat link?

The time of day is also a question, as I think the last talk ends at 9:20pm 
local time. But we can make that work if necessary.

> It might help to have a diagram (perhaps I can collaborate with you
on this?)

I absolutely agree. This is something I had planned to produce but it’s been a 
question of time. In part I wanted to ensure we published long in advance of 
ApacheCon, but now also with CEP-10, CEP-14 and CEP-15 in flight it’s hard to 
get back to improving the draft. If you’d be interested in collaborating on 
this that would be super appreciated, as this would certainly help the reader.

>I think that WAN is always paid during the Consensus Protocol, and then in 
>most cases execution can remain LAN except in 3+ datacenters where I think 
>you'd have to include at least one replica in a neighboring datacenter…

As designed the only WAN cost is consensus as Accord ensures every replica 
receives a complete copy of every transaction, and is aware of any gaps. If 
there are gaps there may be WAN delays as those are filled in. This might occur 
because of network outages, but is most likely to occur when transactions are 
being actively executed by multiple DCs at once – in which case there’ll be one 
further unidirectional WAN latency during execution while the earlier 
transaction disseminates its result to the later transaction(s). There are 
other similar scenario we can discuss, e.g. if a transaction takes the slow 
path and will execute after a transaction being executed in another DC, that 
remote transaction needs to receive this notification before executing.

There might potentially be some interesting optimisations to make in future, 
where with many queued transactions a single DC may nominate itself to execute 
all outstanding queries and respond to the remote DCs that issued them so as to 
eliminate the WAN latency for disseminating the result of each transaction. But 
we’re getting way ahead of ourselves there 😊

There’s also no LAN cost on write, at least for responding to the client. If 
there is a dependent transaction within the same DC then (as in the above case) 
there will be a LAN penalty for the second transaction to execute.

> Relatedly I'm curious if there is any way that the client can
acquire the timestamp used by the transaction before sending the data
so we can make the operations idempotent and unrelated to the
coordinator that was executing them as the storage nodes are
vulnerable to disk and heap failure modes which makes them much more
likely to enter grey failure (slow). Alternatively, perhaps it would
make sense to introduce a set of optional dedicated C* nodes for
reaching consensus that do not act as storage nodes so we don't have
to worry about hanging coordinators (join_ring=false?)?

So, in principle coordination can be performed by any node on the network 
including a client – though we’d need to issue the client a unique id this can 
be done cheaply on joining. This might be something to explore in future, 
though there are downsides to having more coordinators too (more likely to 
fail, and stall further transactions that depend on transactions it is 
coordinating).

However, with respect to idempotency, I expect Accord not to perpetuate the 
problems of LWTs where the result of an earlier query is unknown. At least 
success/fail will be maintained in a distributed fashion for some reasonable 
time horizon, and there will also be protection against zombie transactions 
(those proposed to a node that went into a failure spiral before reaching 
healthy nodes, that somehow regurgitates it hours or days later), so we should 
be able to provide practical precisely-once semantics to clients.

Whether this is done with a client provided timestamp, or simply some other 
arbitrary client-provided id that can be utilised to deduplicate requests or 
query the status of a transaction is something we can explore later. This is 
something we should explore in a dedicated discussion as development of Accord 
progresses.

> Should Algorithm 1 line 12 be PreAcceptOK from Et (not Qt) or should
line 2 read Qt instead of Et?

So, technically as it reads today I think it’s correct. For Line 2 there is 
always some Qt \subseteq Et. I think the problem here is that actually there’s 
a bunch of valid things to do, including picking some arbitrary subset of each 
rho in Pt so long as it contains some Qt. It’s hard to convey the range of 
options precisely. Line 12 of course really wants to execute only when some Ft 
has responded, but if no such response is forthcoming it wants to execute on 
some Qt, but of course Ft \superseteq Qt. Perhaps I should try to state the set 
inequalities here. I will think about what I can do to improve the clarity, 
thanks.

> It might make sense for participating members to wait for a minimum detected 
> clock skew before becoming eligible for electorate?

This is a great idea, thanks!

> I don't really understand how temporarily down replicas will learn
of mutations they missed .. are we just leveraging some
external repair?

Yes, precisely. Though in practice any transaction they need to know to answer 
a Read etc, they can query a peer for. But in practice I expect to deliver a 
real-time repair mechanism scoped (initially, at least) to Accord transactions 
to ensure this happens promptly.

> Relatedly since non-transactional reads wouldn't flow through
consensus (I hope) would it make sense for a restarting node to learn
the latest accepted time once and then be deprioritized for all reads
until it has accepted what it missed? Or is the idea that you would
_always_ read transactionally (and since it's a read only transaction
you can skip the WAN consensus and just go straight to fast path
reads)?

I expect that tables will be marked transactional, and that every operation 
that goes through them will be transactional. However I can imagine offering 
weaker read semantics, particularly if you’re looking to avoid paying the WAN 
price if you aren’t worried about consistency. I haven’t really considered how 
we might marry the two within a table, and I’m open to suggestions here. I 
expect that this dovetails with future improvements to transactional cluster 
metadata. I think also in part this kind of behaviour is limited today because 
repair is too unwieldy, and also because we don’t have an “on but catching up” 
state. If we improve repair for transactions the first part may be solved, and 
perhaps we can introduce a new node state as part of improving our approach to 
cluster management.

I could imagine having some bounded divergence  in general, e.g. I haven’t 
corroborated my transaction history in Xms with a majority, or I haven’t 
received Xms of the transaction history I’ve witnessed, so I’m going to remove 
myself from the read set for non-transactional operations. But I don’t envisage 
this landing in V1.

* I know the paper says that we elide details of how the shards (aka
replica sets?) are chosen, but it seems that this system would have a
hard dependency on a strongly consistent shard selection system (aka
token metadata?) wouldn't it? In particular if the simple quorums
(which I interpreted to be replica sets in current C*, not sure if
that's correct) can change in non linearizable ways I don't think
Property 3.3 can hold. I think you hint at a solution to this in
section 5 but I'm not sure I grok it.

Yes, it does. That’s something that’s in hand, and colleagues will be reaching 
out to the list about in the next couple of months. I anticipate this being a 
solved problem before Accord depends on it. There’s still a bunch of complexity 
within Accord for applying topology changes safely (which Section 5 nods to), 
but the membership decisions will be taken by Cassandra – safely.


From: Joseph Lynch <[email protected]>
Date: Monday, 20 September 2021 at 17:17
To: [email protected] <[email protected]>
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
Benedict,

Thank you very much for advancing this proposal, I'm extremely excited
to see flexible quorums used in this way and am looking forward to the
integration of Accord into Cassandra! I read the whitepaper and have a
few questions, but I was wondering what do you think about having some
extended Q&A after your ApacheCon talk Wednesday (maybe at the end of
the C* track)? It might be higher bandwidth than going back and forth
on email/slack (also given you're presenting on it that might be a
good time to discuss it)?

Briefly
* It might help to have a diagram (perhaps I can collaborate with you
on this?) showing the happy path delay waiting in the reorder buffer
and the messages that are sent in a 2 and 3 datacenter deployment
during the PreAccept, Accept, Commit, Execute, Apply phases. In
particular it was hard for me to follow where exactly I was paying WAN
latency and where we could achieve progress with LAN only (I think
that WAN is always paid during the Consensus Protocol, and then in
most cases execution can remain LAN except in 3+ datacenters where I
think you'd have to include at least one replica in a neighboring
datacenter). In particular, it seems that Accord always pays clock
skew + WAN latency during the reorder buffer (as part of consensus) +
2x LAN latency during execution (to read and then write).
* Relatedly I'm curious if there is any way that the client can
acquire the timestamp used by the transaction before sending the data
so we can make the operations idempotent and unrelated to the
coordinator that was executing them as the storage nodes are
vulnerable to disk and heap failure modes which makes them much more
likely to enter grey failure (slow). Alternatively, perhaps it would
make sense to introduce a set of optional dedicated C* nodes for
reaching consensus that do not act as storage nodes so we don't have
to worry about hanging coordinators (join_ring=false?)?
* Should Algorithm 1 line 12 be PreAcceptOK from Et (not Qt) or should
line 2 read Qt instead of Et?
* I think your claims about clock skew being <1ms in general is
accurate at least for AWS except for when machines boot for the first
time (I can send you some data shortly). It might make sense for
participating members to wait for a minimum detected clock skew before
becoming eligible for electorate?
* I don't really understand how temporarily down replicas will learn
of mutations they missed, did I miss the part where a read replica
would recover all transactions between its last accepted time and
another replica's last accepted time? Or are we just leveraging some
external repair?
* Relatedly since non-transactional reads wouldn't flow through
consensus (I hope) would it make sense for a restarting node to learn
the latest accepted time once and then be deprioritized for all reads
until it has accepted what it missed? Or is the idea that you would
_always_ read transactionally (and since it's a read only transaction
you can skip the WAN consensus and just go straight to fast path
reads)?
* I know the paper says that we elide details of how the shards (aka
replica sets?) are chosen, but it seems that this system would have a
hard dependency on a strongly consistent shard selection system (aka
token metadata?) wouldn't it? In particular if the simple quorums
(which I interpreted to be replica sets in current C*, not sure if
that's correct) can change in non linearizable ways I don't think
Property 3.3 can hold. I think you hint at a solution to this in
section 5 but I'm not sure I grok it.

Super interesting proposal and I am looking forward to all the
improvements this will bring to the project!

Cheers,
-Joey

On Mon, Sep 20, 2021 at 1:34 AM Miles Garnsey
<[email protected]> wrote:
>
> If Accord can fulfil its aims it sounds like a huge improvement to the state 
> of the art in distributed transaction processing. Congrats to all involved in 
> pulling the proposal together.
>
> I was holding off on feedback since this is quite in depth and I don’t want 
> to bike shed, I still haven’t spent as much time understanding this as I’d 
> like.
>
> Regardless, I’ll make the following notes in case they’re helpful. My 
> feedback is more to satisfy my own curiosity and stimulate discussion than to 
> suggest that there are any flaws here. I applaud the proposed testing 
> approach and think it is the only way to be certain that the proposed 
> consistency guarantees will be upheld.
>
> General
>
> I’m curious if/how this proposal addresses issues we have seen when scaling; 
> I see reference to simple majorities of nodes - is there any plan to ensure 
> safety under scaling operations or DC (de)commissioning?
>
> What consistency levels will be supported under Accord? Will it simply be a 
> single CL representing a majority of nodes across the whole cluster? (This at 
> least would mitigate the issues I’ve seen when folks want to switch from 
> EACH_SERIAL to SERIAL).
>
> Accord
>
> > Accord instead assembles an inconsistent set of dependencies.
>
>
> Further explanation here would be good. Do we mean to say that the 
> dependancies may differ according to which transactions the coordinator has 
> witnessed at the time the incoming transaction is first seen? This would make 
> sense if some nodes had not fully committed a foregoing transaction.
>
> Is it correct to think of this step as assembling a dependancy graph of 
> foregoing transactions which must be completed ahead of progressing the 
> incoming new transaction?
>
> Fast Path
>
> > A coordinator C proposes a timestamp t0 to at least a quorum of a fast path 
> > electorate. If t0 is larger than all timestamps witnessed for all prior 
> > conflicting transactions, t0 is accepted by a replica. If a fast path 
> > quorum of responses accept, the transaction is agreed to execute at t0. 
> > Replicas respond with the set of transactions they have witnessed that may 
> > execute with a lower timestamp, i.e. those with a lower t0.
>
> What is t0 here? I’m guessing it is the Lamport clock time of the most recent 
> mutation to the partition? May be worth clarifying because otherwise the 
> perception may be that it is the commencement time of the current transaction 
> which may not be the intention.
>
> Regarding the use of logical clocks in general -
>
> Do we have one clock-per-shard-per-node? Or is there a single clock for all 
> transactions on a node?
> What happens in network partitions?
> In a cross-shard transaction does maintaining simple majorities of replicas 
> protect you from potential inconsistencies arising when a transaction W10 
> addressing partitions p1, p2 comes from a different majority (potentially 
> isolated due to a network partition) from earlier writes W[1,9] to p1 only?
> It seems that this may cause a sudden change to the dependancy graph for 
> partition p2 which may render it vulnerable to strange effects?
> Do we consider adversarial cases or any sort of byzantine faults? (That’s a 
> bit out of left field, feel free to kick me.)
> Why do we prefer Lamport clocks to vector clocks or other types of logical 
> clock?
>
> Slow Path
>
> > This value is proposed to at least a simple majority of nodes, along with 
> > the union of the dependenciesreceived
>
>
> Related to the earlier point: when we say `union` here - what set are we 
> forming a union over? Is it a union of all dependancies t_n < t as seen by 
> all coordinators? I presume that the logic precludes the possibility that 
> these dependancies will conflict, since all foregoing transactions which are 
> in progress as dependancies must be non-conflicting with earlier transactions 
> in the dependancy graph?
>
> In any case, further information about how the dependancy graph is computed 
> would be interesting.
>
> > The inclusion of dependencies in the proposal is solely to facilitate 
> > Recovery of other transactions that may be incomplete - these are stored on 
> > each replica to facilitate decisions at recovery.
>
>
> Every replica? Or only those participating in the transaction?
>
> > If C fails to reach fast path consensus it takes the highest t it witnessed 
> > from its responses, which constitutes a simple Lamport clock value imposing 
> > a valid total order. This value is proposed to at least a simple majority 
> > of nodes,
>
>
> When speaking about the simple majority of nodes to whom the max(t) value 
> returned will be proposed to -
> It sounds like this need not be the same majority from whom the original sets 
> of T_n and dependancies was obtained?
> Is there a proof to show that the dependancies created from the union of the 
> first set of replicas resolves to an acceptable dependancy graph for an 
> arbitrary majority of replicas? (Especially given that a majority of replicas 
> is not a majority of nodes, given we are in a cross-shard scenario here).
> What happens in cases where the replica set has changed due to (a) scaling RF 
> in a single DC (b) adding a whole new DC?
> Wikipedia <https://en.wikipedia.org/wiki/Lamport_timestamp> tells me that 
> Lamport clocks only impose partial, not total order. I’m guessing we’re 
> thinking of a different type of logical clock when we speak of Lamport clocks 
> here (but my expertise is sketchy on this topic).
>
> Recovery
>
> I would be interested in further exploration of the unhappy path (where 'a 
> newer ballot has been issued by a recovery coordinator to take over the 
> transaction’). I understand that this may be partially covered in the 
> pseudocode for `Recovery` but I’m struggling to reconcile the ’new ballot has 
> been issued’ language with the ‘any R in responses had X as Applied, 
> Committed, or Accepted’ language.
>
> Well done again and thank you for pushing the envelope in this area Benedict.
>
> Miles
>
> > On 15 Sep 2021, at 11:33 pm, [email protected] wrote:
> >
> >> I would kind of expect this work, if it pans out, to _replace_ the current 
> >> paxos implementation
> >
> > That’s a good point. I think the clear direction of travel would be total 
> > replacement of Paxos, but I anticipate that this will be feature-flagged at 
> > least initially. So for some period of time we may maintain both options, 
> > with the advanced CQL functionality disabled if you opt for classic Paxos.
> >
> > I think this is a necessary corollary of a requirement to support live 
> > upgrades – something that is non-negotiable IMO, but that I have also 
> > neglected to discuss in the CEP. I will rectify this. An open question is 
> > if we want to support live downgrades back to Classic Paxos. I kind of 
> > expect that we will, though that will no doubt be informed by the 
> > difficulty of doing so.
> >
> > Either way, this means the deprecation cycle for Classic Paxos is probably 
> > a separate and future decision for the community. We could choose to 
> > maintain it indefinitely, but I would vote to retire it the following major 
> > version.
> >
> > A related open question is defaults – I would probably vote for new 
> > clusters to default to Accord, and existing clusters to need to run a 
> > migration command after fully upgrading the cluster.
> >
> > From: Sylvain Lebresne <[email protected]>
> > Date: Wednesday, 15 September 2021 at 14:13
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > Fwiw, it makes sense to me to talk about CQL syntax evolution separately.
> >
> > It's pretty clear to me that we _can_ extend CQL to make sure of a general
> > purpose transaction mechanism, so I don't think deciding if we want a
> > general purpose transaction mechanism has to depend on deciding on the
> > syntax. Especially since the syntax question can get pretty far on its own
> > and could be a serious upfront distraction.
> >
> > And as you said, there are even queries that can be expressed with the
> > current syntax that we refuse now and would be able to accept with this, so
> > those could be "ground zero" of what this work would allow.
> >
> > But outside of pure syntax questions, one thing that I don't see discussed
> > in the CEP (or did I miss it) is what the relationship of this new
> > mechanism with the existing paxos implementation would be? I would kind of
> > expect this work, if it pans out, to _replace_ the current paxos
> > implementation (because 1) why not and 2) the idea of having 2
> > serialization mechanisms that serialize separately sounds like a nightmare
> > from the user POV) but it isn't stated clearly. If replacement is indeed
> > the intent, then I think there needs to be a plan for the upgrade path. If
> > that's not the intent, then what?
> > --
> > Sylvain
> >
> >
> > On Wed, Sep 15, 2021 at 12:09 PM [email protected] <[email protected]>
> > wrote:
> >
> >> Ok, so the act of typing out an example was actually a really good
> >> reminder of just how limited our functionality is today, even for single
> >> partition operations.
> >>
> >> I don’t want to distract from any discussion around the underlying
> >> protocol, but we could kick off a separate conversation about how to evolve
> >> CQL sooner than later if there is the appetite. There are no concrete
> >> proposals to discuss, it would be brainstorming.
> >>
> >> Do people also generally agree this work warrants a distinct CEP, or would
> >> people prefer to see this developed under the same umbrella?
> >>
> >>
> >>
> >> From: [email protected] <[email protected]>
> >> Date: Wednesday, 15 September 2021 at 09:19
> >> To: [email protected] <[email protected]>
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >>> perhaps we can prepare these as examples
> >>
> >> There are grammatically correct CQL queries today that cannot be executed,
> >> that this work will naturally remove the restrictions on. I’m certainly
> >> happy to specify one of these for the CEP if it will help the reader.
> >>
> >> I want to exclude “new CQL commands” or any other enhancement to the
> >> grammar from the scope of the CEP, however. This work will enable a range
> >> of improvements to the UX, but I think this work is a separate, long-term
> >> project of evolution that deserves its own CEPs, and will likely involve
> >> input from a wider range of contributors and users. If nobody else starts
> >> such CEPs, I will do so in due course (much further down the line).
> >>
> >> Assuming there is not significant dissent on this point I will update the
> >> CEP to reflect this non-goal.
> >>
> >>
> >>
> >> From: C. Scott Andreas <[email protected]>
> >> Date: Wednesday, 15 September 2021 at 00:31
> >> To: [email protected] <[email protected]>
> >> Cc: [email protected] <[email protected]>
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> Adding a few notes from my perspective as well –
> >>
> >> Re: the UX question, thanks for asking this.
> >>
> >> I agree that offering a set of example queries and use cases may help make
> >> the specific use cases more understandable; perhaps we can prepare these as
> >> examples to be included in the CEP.
> >>
> >> I do think that all potential UX directions begin with the specification
> >> of the protocol that will underly them, as what can be expressed by it may
> >> be a superset of what's immediately exposed by CQL. But at minimum it's
> >> great to have a sense of the queries one might be able to issue to focus a
> >> reading of the whitepaper.
> >>
> >> Re: "Can we not start using it as an external dependency, and later
> >> re-evaluate if it's necessary to bring it into the project or even incubate
> >> it as another Apache project"
> >>
> >> I think it would be valuable to the project for the work to be incubated
> >> in a separate repository as part of the Apache Cassandra project itself,
> >> much like the in-JVM dtest API and Harry. This pattern worked well for
> >> those projects as they incubated as it allowed them to evolve outside the
> >> primary codebase, but subject to the same project governance, set of PMC
> >> members, committers, and so on. Like those libraries, it also makes sense
> >> as the Cassandra project is the first (and at this time) only known
> >> intended consumer of the library, though there may be more in the future.
> >>
> >> If the proposal is accepted, the time horizon envisioned for this work's
> >> completion is ~9 months to a standard of production readiness. The
> >> contributors see value in the work being donated to and governed by the
> >> contribution practices of the Foundation. Doing so ensures that it is being
> >> developed openly and with full opportunity for review and contribution of
> >> others, while also solidifying contribution of the IP to the project.
> >>
> >> Spinning up a separate ASF incubation project is an interesting idea, but
> >> I feel that doing so would introduce a far greater overhead in process and
> >> governance, and that the most suitable governance and set of committers/PMC
> >> members are those of the Apache Cassandra project itself.
> >>
> >> On Sep 14, 2021, at 3:53 PM, "[email protected]" <[email protected]>
> >> wrote:
> >>
> >>
> >> Hi Paulo,
> >>
> >> First and foremost, I believe this proposal in its current form focuses on
> >> the protocol details (HOW?) but lacks the bigger picture on how this is
> >> going to be exposed to the user (WHAT)?
> >>
> >> In my opinion this CEP embodies a coherent distinct and complex piece of
> >> work, that requires specialist expertise. You have after all just suggested
> >> a month to read only the existing proposal 😊
> >>
> >> UX is a whole other kind of discussion, that can be quite opinionated, and
> >> requires different expertise. It is in my opinion helpful to break out work
> >> that is not tightly coupled, as well as work that requires different
> >> expertise. As you point out, multi-key UX features are largely independent
> >> of any underlying implementation, likely can be done in parallel, and even
> >> with different contributors.
> >>
> >> Can we not start using it as an external dependency
> >>
> >> I would love to understand your rationale, as this is a surprising
> >> suggestion to me. This is just like any other subsystem, but we would be
> >> managing it as a separate library primarily for modularity reasons. The
> >> reality is that this option should anyway be considered unavailable. This
> >> is a proposed contribution to the Cassandra project, which we can either
> >> accept or reject.
> >>
> >> Isn't this a good chance to make the serialization protocol pluggable
> >> with clearly defined integration points
> >>
> >> It has recently been demonstrated to be possible to build a system that
> >> can safely switch between different consensus protocols. However, this was
> >> very sophisticated work that would require its own CEP, one that we would
> >> be unable to resource. Even if we could this would be insufficient. This
> >> goal has never been achieved for a multi-shard transaction protocol to my
> >> knowledge, and multi-shard transaction protocols are much more divergent in
> >> implementation detail than consensus protocols.
> >>
> >> so we could easily switch implementations with different guarantees… (ie.
> >> Apache Ratis)
> >>
> >> As far as I know, there are no other strict serializable protocols
> >> available to plug in today. Apache Ratis appears to be a straightforward
> >> Raft implementation, and therefore it is a linearizable consensus protocol.
> >> It is not multi-shard transaction protocol at all, let alone strict
> >> serializable. It could be used in place of Paxos, but not Accord.
> >>
> >>
> >>
> >> From: Paulo Motta <[email protected]>
> >> Date: Tuesday, 14 September 2021 at 22:55
> >> To: Cassandra DEV <[email protected]>
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> I can start with some preliminary comments while I get more familiarized
> >> with the proposal:
> >>
> >> - First and foremost, I believe this proposal in its current form focuses
> >> on the protocol details (HOW?) but lacks the bigger picture on how this is
> >> going to be exposed to the user (WHAT)? Is exposing linearizable
> >> transactions to the user not a goal of this proposal? If not, I think the
> >> proposal is missing the UX (ie. what CQL commands are going to be added
> >> etc) on how these transactions are going to be exposed.
> >>
> >> - Why do we need to bring the library into the project umbrella? Can we not
> >> start using it as an external dependency, and later re-evaluate if it's
> >> necessary to bring it into the project or even incubate it as another
> >> Apache project? I feel we may be importing unnecessary management overhead
> >> into the project while only a small subset of contributors will be involved
> >> with the core protocol.
> >>
> >> - Isn't this a good chance to make the serialization protocol pluggable
> >> with clearly defined integration points, so we could easily switch
> >> implementations with different guarantees, trade-offs and performance
> >> considerations while leaving the UX intact? This would also allow us to
> >> easily benchmark the protocol against alternatives (ie. Apache Ratis) and
> >> validate the performance claims. I think the best way to do that would be
> >> to define what the feature will look like to the end user (UX), define the
> >> integration points necessary to support this feature, and use accord as the
> >> first implementation of these integration points.
> >>
> >> Em ter., 14 de set. de 2021 às 17:57, Paulo Motta <
> >> [email protected]>
> >> escreveu:
> >>
> >> Given the extensiveness and complexity of the proposal I'd suggest leaving
> >> it a little longer (perhaps 4 weeks from the publish date?) for people to
> >> get a bit more familiarized and have the chance to comment before casting a
> >> vote. I glanced through the proposal - and it looks outstanding, very
> >> promising work guys! - but would like a bit more time to take a deeper look
> >> and digest it before potentially commenting on it.
> >>
> >> Em ter., 14 de set. de 2021 às 17:30, [email protected] <
> >> [email protected]> escreveu:
> >>
> >> Has anyone had a chance to read the drafts, and has any feedback or
> >> questions? Does anybody still anticipate doing so in the near future? Or
> >> shall we move to a vote?
> >>
> >> From: [email protected] <[email protected]>
> >> Date: Tuesday, 7 September 2021 at 21:27
> >> To: [email protected] <[email protected]>
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> Hi Jake,
> >>
> >>> What structural changes are planned to support an external dependency
> >> project like this
> >>
> >> To add to Blake’s answer, in case there’s some confusion over this, the
> >> proposal is to include this library within the Apache Cassandra project. So
> >> I wouldn’t think of it as an external dependency. This PMC and community
> >> will still have the usual oversight over direction and development, and
> >> APIs will be developed solely with the intention of their integration with
> >> Cassandra.
> >>
> >>> Will this effort eventually replace consistency levels in C*?
> >>
> >> I hope we’ll have some very related discussions around consistency levels
> >> in the coming months more generally, but I don’t think that is tightly
> >> coupled to this work. I agree with you both that we won’t want to
> >> perpetuate the problems you’ve highlighted though.
> >>
> >> Henrik:
> >>> I was referring to the property that Calvin transactions also need to
> >> be sent to the cluster in a single shot
> >>
> >> Ah, yes. In that case I agree, and I tried to point to this direction in
> >> an earlier email, where I discussed the use of scripting languages (i.e.
> >> transactionally modifying the database with some subset of arbitrary
> >> computation). I think the JVM is particularly suited to offering quite
> >> powerful distributed transactions in this vein, and it will be interesting
> >> to see what we might develop in this direction in future.
> >>
> >>
> >> From: Jake Luciani <[email protected]>
> >> Date: Tuesday, 7 September 2021 at 19:27
> >> To: [email protected] <[email protected]>
> >> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> >> Great thanks for the information
> >>
> >> On Tue, Sep 7, 2021 at 12:44 PM Blake Eggleston
> >> <[email protected]> wrote:
> >>
> >>> Hi Jake,
> >>>
> >>>> 1. Will this effort eventually replace consistency levels in C*? I
> >> ask
> >>>> because one of the shortcomings of our paxos today is
> >>>> it can be easily mixed with non serialized consistencies and therefore
> >>>> users commonly break consistency by for example reading at CL.ONE
> >> while
> >>>> also
> >>>> using LWTs.
> >>>
> >>> This will likely require CLs to be specified at the schema level for
> >>> tables using multi partition transactions. I’d expect this to be
> >> available
> >>> for other tables, but not required.
> >>>
> >>>> 2. What structural changes are planned to support an external
> >> dependency
> >>>> project like this? Are there some high level interfaces you expect
> >> the
> >>>> project to adhere to?
> >>>
> >>> There will be some interfaces that need to be implemented in C* to
> >> support
> >>> the library. You can find the current interfaces in the accord.api
> >> package,
> >>> but these were written to support some initial testing, and not intended
> >>> for integration into C* as is. Things are pretty fluid right now and
> >> will
> >>> be rewritten / refactored multiple times over the next few months.
> >>>
> >>> Thanks,
> >>>
> >>> Blake
> >>>
> >>>
> >>>> On Sun, Sep 5, 2021 at 10:33 AM [email protected] <
> >> [email protected]
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Wiki:
> >>>>>
> >>>
> >>
> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions
> >>>>> Whitepaper:
> >>>>>
> >>>
> >>
> >> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf
> >>>>> <
> >>>>>
> >>>
> >>
> >> https://cwiki.apache.org/confluence/download/attachments/188744725/Accord.pdf?version=1&modificationDate=1630847736966&api=v2
> >>>>>>
> >>>>> Prototype: https://github.com/belliottsmith/accord
> >>>>>
> >>>>> Hi everyone, I’d like to propose this CEP for adoption by the
> >> community.
> >>>>>
> >>>>> Cassandra has benefitted from LWTs for many years, but application
> >>>>> developers that want to ensure consistency for complex operations
> >> must
> >>>>> either accept the scalability bottleneck of serializing all related
> >>> state
> >>>>> through a single partition, or layer a complex state machine on top
> >> of
> >>> the
> >>>>> database. These are sophisticated and costly activities that our
> >> users
> >>>>> should not be expected to undertake. Since distributed databases are
> >>>>> beginning to offer distributed transactions with fewer caveats, it is
> >>> past
> >>>>> time for Cassandra to do so as well.
> >>>>>
> >>>>> This CEP proposes the use of several novel techniques that build upon
> >>>>> research (that followed EPaxos) to deliver (non-interactive) general
> >>>>> purpose distributed transactions. The approach is outlined in the
> >>> wikipage
> >>>>> and in more detail in the linked whitepaper. Importantly, by adopting
> >>> this
> >>>>> approach we will be the _only_ distributed database to offer global,
> >>>>> scalable, strict serializable transactions in one wide area
> >> round-trip.
> >>>>> This would represent a significant improvement in the state of the
> >> art,
> >>>>> both in the academic literature and in commercial or open source
> >>> offerings.
> >>>>>
> >>>>> This work has been partially realised in a prototype. This partial
> >>>>> prototype has been verified against Jepsen.io’s Maelstrom library and
> >>>>> dedicated in-tree strict serializability verification tools, but much
> >>> work
> >>>>> remains for the work to be production capable and integrated into
> >>> Cassandra.
> >>>>>
> >>>>> I propose including the prototype in the project as a new source
> >>>>> repository, to be developed as a standalone library for integration
> >> into
> >>>>> Cassandra. I hope the community sees the important value proposition
> >> of
> >>>>> this proposal, and will adopt the CEP after this discussion, so that
> >> the
> >>>>> library and its integration into Cassandra can be developed in
> >> parallel
> >>> and
> >>>>> with the involvement of the wider community.
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://twitter.com/tjake
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>>
> >>
> >> --
> >> http://twitter.com/tjake
> >>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] CEP-15: General Purpose Transactions

Reply via email to