> If I'm reading you correctly, then Accord does / could do exactly what I was 
> asking for: two round trips in a single DC cluster, and one roundtrip + 
> SkewMax when network roundtrips are >> SkewMax.

Yes, in fact it’s even better than that. Even in this setup *most* transactions 
will still take only one round-trip, and at worst case (under conflicts) two 
round-trips.

> assuming I got it correct...

As far as I can tell your understanding is correct, yes - though worth noting 
of course that the WAN round-trip on write is asynchronous.

I haven’t encountered Galera – do you have any technical papers to hand?

From: Henrik Ingo <henrik.i...@datastax.com>
Date: Friday, 1 October 2021 at 16:24
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
On Fri, Oct 1, 2021 at 5:30 PM bened...@apache.org <bened...@apache.org>
wrote:

> > Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB
> discussions = 7 ms
>
> I think skew max is likely to be much lower than this, even on commodity
> hardware. Bear in mind that unlike Cockroach and Spanner correctness does
> not depend on this value, only performance. So we can pick the real number,
> not some p100 outlier value.
>
> Also bear in mind that this is an optimisation. In clusters where it makes
> no sense we can simply use the raw protocol and accept transactions will
> very infrequently take two round-trips (which is fine, because in this
> scenario round-trips are cheap).
>
>
Oh, this was not at all obvious :-D

If I'm reading you correctly, then Accord does / could do exactly what I
was asking for: two round trips in a single DC cluster, and one roundtrip +
SkewMax when network roundtrips are >> SkewMax.



> > A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node
>
> So, with a leaderless protocol like Accord the ordering decisions are
> never really bottlenecked - no matter how many are in-flight, a new
> transaction will experience no additional latency determining its execution
> order. The only bottleneck will be execution. For this it is absolutely
> possible to funnel everything to a single coordinator, but I don’t know
> that this would in practice achieve much – the important bottleneck would
> be that the coordinators are all within the same
>
> DC, so that the _replicas_ may all respond to them with their data
> dependencies with minimal delay. This is something we discussed in the
> ApacheCon call as it happens. If a significant number of transactions are
> pending, and they are in different DCs, it would be quite straightforward
> to nominate a coordinator within the DC serving the majority of operations
> to serve the remainder, and to forward the results to the original
> coordinators.
>
>
Thanks for explaining. This is really interesting. I now reread section 2.2
of the paper and realize it says exactly this.

So in Accord:

Step 1: One network round trip + SkewMax to establish a global ordering.

Step 2: a) One (local) network round trip for read phase, One (wan) round
trip for writes.
             b) In addition, before either reading or writing, the node
must first commit and apply all previous transactions that are in the
"deps" set of this transaction.

In addition, if we implement interactive transactions, or support for
secondary indexes, or other "complex" transactions, then that work would
happen before Step 1.

Ok, now that I spelled this out... assuming I got it correct... Then this
actually resembles Galera more than Spanner. The wall clock time is not
actually the transaction id, it's just a step in the consensus dialogue
where nodes agree on a global ordering.



> I don’t anticipate this optimisation being a high priority until we have
> user reports of this bottleneck in the wild, however. Since clients for
> many workloads will naturally be geo-partitioned so that related state is
> being updated from the same region, it might simply not be needed – at
> least any time soon.
>
>
For sure. I think we're all just trying to understand the landscape what we
are talking about here, not trying to say everything should be implemented
in v1.


henrik

--

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Reply via email to