Yeah I think that’s intuitive enough. I had been thinking about multiple condition branches, but was thinking about something closer to
IF select.column=5 UPDATE ... SET ... WHERE key=1; ELSE IF select.column=6 UPDATE ... SET ... WHERE key=2; ELSE UPDATE ... SET ... WHERE key=3; ENDIF COMMIT TRANSACTION; Which would make the proposed COMMIT IF we're talking about now a shorthand. Of course this would be follow on work. > On Jun 8, 2022, at 1:20 PM, bened...@apache.org wrote: > > I imagine that conditions would be evaluated against the state prior to the > execution of statement against which it is being evaluated, but after the > prior statements. I think that should be OK to reason about. > > i.e. we might have a contrived example like: > > BEGIN TRANSACTION > UPDATE tbl SET a = 1 WHERE k = 1 AS q1 > UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2 > COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1 > > So q1 would read a = 0, but q2 would read a = 1 and set a = 2. > > I think this is probably adequately intuitive? It is a bit atypical to have > conditions that wrap the whole transaction though. > > We have another option, of course, which is to offer IF x ROLLBACK > TRANSACTION, which is closer to SQL, which would translate the above to: > > BEGIN TRANSACTION > SELECT a FROM tbl WHERE k = 1 AS q0 > IF q0.a != 0 ROLLBACK TRANSACTION > UPDATE tbl SET a = 1 WHERE k = 1 AS q1 > IF q1.a != 1 ROLLBACK TRANSACTION > UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2 > COMMIT TRANSACTION > > This is less succinct, but might be more familiar to users. We could also > eschew the ability to read from UPDATE statements entirely in this scheme, as > this would then look very much like SQL. > > > From: Blake Eggleston <beggles...@apple.com> > Date: Wednesday, 8 June 2022 at 20:59 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: CEP-15 multi key transaction syntax > > > It affects not just RETURNING but also conditions that are evaluated > > against the row, and if we in future permit using the values from one > > select in a function call / write to another table (which I imagine we > > will). > > I hadn’t thought about that... using intermediate or even post update values > in condition evaluation or function calls seems like it would make it > difficult to understand why a condition is or is not applying. On the other > hand, it would powerful, especially when using things like database generated > values in queries (auto incrementing integer clustering keys or server > generated timeuuids being examples that come to mind). Additionally, if we > return these values, I guess that would solve the visibility issues I’m > worried about. > > Agreed intermediate values would be straightforward to calculate though. > > > On Jun 6, 2022, at 4:33 PM, bened...@apache.org <mailto:bened...@apache.org> > wrote: > > It affects not just RETURNING but also conditions that are evaluated against > the row, and if we in future permit using the values from one select in a > function call / write to another table (which I imagine we will). > > I think that for it to be intuitive we need it to make sense sequentially, > which means either calculating it or restricting what can be stated (or > abandoning the syntax). > > If we initially forbade multiple UPDATE/INSERT to the same key, but permitted > overlapping DELETE (and as many SELECT as you like) that would perhaps make > it simple enough? Require for now that SELECTS go first, then DELETE and then > INSERT/UPDATE (or vice versa, depending what we want to make simple)? > > FWIW, I don’t think this is terribly onerous to calculate either, since it’s > restricted to single rows we are updating, so we could simply maintain a > collections of rows and upsert into them as we process the execution. Most > transactions won’t need it, I suspect, so we don’t need to worry about > perfect efficiency. > > > From: Blake Eggleston <beggles...@apple.com <mailto:beggles...@apple.com>> > Date: Tuesday, 7 June 2022 at 00:21 > To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> > <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> > Subject: Re: CEP-15 multi key transaction syntax > > That's a good question. I'd lean towards returning the final state of things, > although I could understand expecting to see intermediate state. Regarding > range tombstones, we could require them to precede any updates like selects, > but there's still the question of how to handle multiple updates to the same > cell when the user has requested we return the post-update state of the cell. > > > > On Jun 6, 2022, at 4:00 PM, bened...@apache.org <mailto:bened...@apache.org> > wrote: > > > if multiple updates end up touching the same cell, I’d expect the last one > > to win > > Hmm, yes I suppose range tombstones are a plausible and reasonable thing to > mix with inserts over the same key range. > > What’s your present thinking about the idea of handling returning the values > as of a given point in the sequential execution then? > > The succinct syntax is I think highly desirable for user experience, but this > does complicate it a bit if we want to remain intuitive. > > > > > From: Blake Eggleston <beggles...@apple.com <mailto:beggles...@apple.com>> > Date: Monday, 6 June 2022 at 23:17 > To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> > <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> > Subject: Re: CEP-15 multi key transaction syntax > > Hi all, > > Thanks for all the input and questions so far. Glad people are excited about > this! > > I didn’t have any free time to respond this weekend, although it looks like > Benedict has responded to most of the questions so far, so if I don’t respond > to a question you asked here, you can interpret that as “what Benedict said” > :). > > > Jeff, > > > Is there a new keyword for “partition (not) exists” or is it inferred by > > the select? > > I'd intended this to be worked out from the select statement, ie: if the > read/reference is null/empty, then it doesn't exist, whether you're > interested in the partition, row, or cell. So I don't think we'd need an > additional keyword there. I think that would address partition exists / not > exists use cases? > > > And would you allow a transaction that had > 1 named select and no > > modification statements, but commit if 1=1 ? > > Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) > would be part of the syntax. Also, running a txn that doesn’t contain updates > wouldn’t be a problem. > > Patrick, I think Benedict answered your questions? Glad you got the joke :) > > Alex, > > > 1. Dependant SELECTs > > 2. Dependant UPDATEs > > 3. UPDATE from secondary index (or SASI) > > 5. UPDATE with predicate on non-primary key > > The full primary key must be defined as part of the statement, and you can’t > use column references to define them, so you wouldn’t be able to run these. > > > MVs > > To prevent being spread too thin, both in syntax design and implementation > work, I’d like to limit read and write operations in the initial > implementation to vanilla selects, updates, inserts, and deletes. Once we > have a solid implementation of multi-key/table transactions supporting > foundational operations, we can start figuring out how the more advanced > pieces can be best supported. Not a great answer to your question, but a > related tangent I should have included in my initial email. > > > ... RETURNING ... > > I like the idea of the returning statement, but to echo what Benedict said, I > think any scheme for specifying data to be returned should apply the same to > select and update statements, since updates can have underlying reads that > the user may be interested in. I’d mentioned having an optional RETURN > statement in addition to automatically returning selects in my original email. > > > ... WITH ... > > I like the idea of defining statement names at the beginning of a statement, > since I could imagine mapping names to selects might get difficult if there > are a lot of columns in the select or update, but beginning each statement > with `WITH <name>` reduces readability imo. Maybe putting the name after the > first term of the statement (ie: `SELECT * AS <name> WHERE...`, `UPDATE table > AS <name> SET ...`, `INSERT INTO table AS <name> (...) VALUES (...);`) would > be improve finding names without harming overall readability? > > Benedict, > > > I agree that SELECT statements should be required to go first. > > +1 > > > There only remains the issue of conditions imposed upon > > UPDATE/INSERT/DELETE statements when there are multiple statements that > > affect the same primary key. I think we can (and should) simply reject such > > queries for now, as it doesn’t make much sense to have multiple statements > > for the same primary key in the same transaction. > > Unfortunately, I think there are use cases for both multiple selects and > updates for the same primary key in a txn. Selects aren’t as problematic, but > if multiple updates end up touching the same cell, I’d expect the last one to > win. This would make dealing with range tombstones a little trickier, since > the default behavior of alternating updates and range tombstones affecting > the same cells is not intuitive, but I don’t think it would be too bad. > > > Something that’s come up a few times, and that I’ve also been thinking about > is whether to return the values that were originally read, or the values > written with the update to the client, and there are use cases for both. I > don’t remember who suggested it, but I think returning the original values > from named select statements, and the post-update values from named update > statements is a good way to handle both. Also, while returning the contents > of the mutation would be the easiest, implementation wise, swapping cell > values from the updates named read would be most useful, since a txn won’t > always result in an update, in which case we’d just return the select. > > Thanks, > > Blake > > > > > > > On Jun 6, 2022, at 9:41 AM, Henrik Ingo <henrik.i...@datastax.com > <mailto:henrik.i...@datastax.com>> wrote: > > On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org > <mailto:bened...@apache.org> <bened...@apache.org > <mailto:bened...@apache.org>> wrote: > > One way to make it obvious is to require the user to explicitly type the > > SELECTs and then to require that all SELECTs appear before > > UPDATE/INSERT/DELETE. > > Yes, I agree that SELECT statements should be required to go first. > > However, I think this is sufficient and we can retain the shorter format for > RETURNING. There only remains the issue of conditions imposed upon > UPDATE/INSERT/DELETE statements when there are multiple statements that > affect the same primary key. I think we can (and should) simply reject such > queries for now, as it doesn’t make much sense to have multiple statements > for the same primary key in the same transaction. > > I guess I was thinking ahead to a future where and UPDATE write set may or > may not intersect with a previous update due to allowing WHERE clause to use > secondary keys, etc. > > That said, I'm not saying we SHOULD require explicit SELECT statements for > every update. I'm sure that would be annoying more than useful.I was just > following a train of thought. > > > > > Returning the "result" from an UPDATE presents the question should it be > > the data at the start of the transaction or end state? > > I am inclined to only return the new values (as proposed by Alex) for the > purpose of returning new auto-increment values etc. If you require the prior > value, SELECT is available to express this. > > That's a great point! > > > > I was thinking the following coordinator-side implementation would allow to > > use also old drivers > > I am inclined to return just the first result set to old clients. I think > it’s fine to require a client upgrade to get multiple result sets. > > Possibly. I just wanted to share an idea for consideration. IMO the temp > table idea might not be too hard to implement*, but sure the syntax does feel > a bit bolted on. > > *) I'm maybe the wrong person to judge that, of course :-) > > henrik > > -- > Henrik Ingo > +358 40 569 7354 <tel:358405697354>