> I was thinking that a path similar to Calvin/FaunaDB is certainly looming in > the horizon at least.
I’m not sure which aspect of these systems you are referring to. Unless I have misunderstood, I consider them to be strictly inferior approaches (particularly for Cassandra) as they require a _global_ leader process and as a result have scalability limits. Users simply shift the sharding problem to the cluster level rather than the node level, but the fundamental problem remains. This may be acceptable for many users, but was contrary to the goals of this CEP. > It seems to me at that point long running queries and interactive > transactions are mostly the same problem. I would estimate long running queries to be easier to deliver by at least an order of magnitude. They’re not unrelated, but they’re still quite distinct in my opinion. > good job pulling together ingredients from state of the art work in this area In case this was lost in the noise: this work is not simply an assembly of prior work. It introduces entirely novel approaches that permit the work to exceed the capabilities of any prior research or production system. It is worth properly highlighting that if we deliver this, Cassandra will have the most sophisticated transaction system full stop. There are to my knowledge no databases offering distributed transactions that are both strict serializable and have no scalability bottleneck. Every database today clearly aims for this combination, but accepts some trade-off: either only guaranteeing serializable isolation, requiring special time keeping hardware to guarantee strict serializability, or using a global leader process (or uses two phase commit, but this is quite niche). From: Henrik Ingo <henrik.i...@datastax.com> Date: Tuesday, 7 September 2021 at 14:06 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions On Tue, Sep 7, 2021 at 12:26 PM bened...@apache.org <bened...@apache.org> wrote: > > whether I should just* think of this as "better and more efficient LWT” > > So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon > definition. My understanding of a core feature/limitation of LWTs is that > they operate over a single partition, and as a result many operations are > impossible even in multiple rounds without complex distributed state > machines. The core improvement here, besides improved performance, is that > we will be able to operate over any set of keys at-once. > > My bad, I have never used LWT and forgot / didn't know they were single partition. The CEP makes more sense now. > How this facility is evolved into user-facing capabilities is an > open-ended question. Initially of course we will at least support the same > syntax but remove the restriction on operating over a single partition. I > haven’t thought about this much, as the CEP is primarily for enabling > works, but I think we will want to expand the syntax in two ways: > > 1) to support more complex conditions (simple AND conditions across all > partitions seem likely too restrictive, though they might make sense for > the single partition case); > 2) to support inserting data from one row into another, potentially with > transformations being applied (including via UDFs). > > These are both relatively manageable improvements that we might want to > land in the same major release as the transactions themselves. The core > facility can be expanded quite broadly, though. It would be possible for > instance to support some interpreted language(s) as part of a query, so > that arbitrary work can be applied in the transaction. > I was thinking that a path similar to Calvin/FaunaDB is certainly looming in the horizon at least. I've been following those with interest, because a) it's refreshingly outside of the box thinking, and b) they seem to be able to push the limitations of this approach much beyond what one might imagine when reading about it the first time. But like you also point out, it remains to be seen whether users actually want those kinds of transactions. We are creatures of habit for sure. > Or, perhaps the community would rather build atop the feature to support > interactive transactions at the client. I can’t predict resourcing for > this, though, and it might be a community effort. I think it would be quite > tractable once this work lands, however. > > > Suppose I wanted to do a long running read-only transaction > > So, there’s two sides to this: with and without paging. A long running > read-only transaction taking a few seconds is quite likely to be fine and > we will probably support with some MVCC within the transaction system > itself. This may or may not be part of v1, it’s hard to predict with > certainty as this is going to be a large undertaking. > > But for paged queries we’d be talking about SNAPSHOT isolation. This is > likely to be something the community wants to support before long anyway > and is probably not as hard as you might think. It is probably outside of > the scope of this work, though the two would dovetail very nicely. > I've pointed out to some of my colleagues that since Cassandra's storage engine is an LSM engine, with some additional work it could become an MVCC style storage engine. Your thinking here seems to be in the same direction, even if it's beyond version 1. (Just for context, also for benefit of other readers on the list, it took MongoDB 5 years and 6 major releases to develop distributed multi-shard transactions. So it's good to talk about the general direction, but understanding that this is not something anyone will finish before Christmas.) It seems to me at that point long running queries and interactive transactions are mostly the same problem. **** Benedict, thanks for the answers. Since I'm not a Cassandra developer I feel it would be inappropriate for me to express an opinion for or against, so I'll just end with saying this is an interesting proposal and the authors have done a good job pulling together ingredients from state of the art work in this area. As such it will be interesting to follow the discussion and work from whitepaper to implementation. A secondary objective was also to just let everyone know I am lurking here. If you ever want to reach out for an off-band discussion, you now have my contact details. henrik