Re: CEP-15 multi key transaction syntax

Blake Eggleston Fri, 10 Jun 2022 10:56:35 -0700

Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to


IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

> On Jun 8, 2022, at 1:20 PM, bened...@apache.org wrote:
> 
> I imagine that conditions would be evaluated against the state prior to the 
> execution of statement against which it is being evaluated, but after the 
> prior statements. I think that should be OK to reason about.
>  
> i.e. we might have a contrived example like:
>  
> BEGIN TRANSACTION
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
> COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1
>  
> So q1 would read a = 0, but q2 would read a = 1 and set a = 2.
>  
> I think this is probably adequately intuitive? It is a bit atypical to have 
> conditions that wrap the whole transaction though.
>  
> We have another option, of course, which is to offer IF x ROLLBACK 
> TRANSACTION, which is closer to SQL, which would translate the above to:
>  
> BEGIN TRANSACTION
> SELECT a FROM tbl WHERE k = 1 AS q0
> IF q0.a != 0 ROLLBACK TRANSACTION
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
> IF q1.a != 1 ROLLBACK TRANSACTION
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
> COMMIT TRANSACTION
>  
> This is less succinct, but might be more familiar to users. We could also 
> eschew the ability to read from UPDATE statements entirely in this scheme, as 
> this would then look very much like SQL.
>  
>  
> From: Blake Eggleston <beggles...@apple.com>
> Date: Wednesday, 8 June 2022 at 20:59
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> > It affects not just RETURNING but also conditions that are evaluated 
> > against the row, and if we in future permit using the values from one 
> > select in a function call / write to another table (which I imagine we 
> > will).
> 
> I hadn’t thought about that... using intermediate or even post update values 
> in condition evaluation or function calls seems like it would make it 
> difficult to understand why a condition is or is not applying. On the other 
> hand, it would powerful, especially when using things like database generated 
> values in queries (auto incrementing integer clustering keys or server 
> generated timeuuids being examples that come to mind). Additionally, if we 
> return these values, I guess that would solve the visibility issues I’m 
> worried about. 
> 
> Agreed intermediate values would be straightforward to calculate though.
> 
> 
> On Jun 6, 2022, at 4:33 PM, bened...@apache.org <mailto:bened...@apache.org> 
> wrote:
>  
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).
>  
> I think that for it to be intuitive we need it to make sense sequentially, 
> which means either calculating it or restricting what can be stated (or 
> abandoning the syntax).
>  
> If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
> overlapping DELETE (and as many SELECT as you like) that would perhaps make 
> it simple enough? Require for now that SELECTS go first, then DELETE and then 
> INSERT/UPDATE (or vice versa, depending what we want to make simple)?
>  
> FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
> restricted to single rows we are updating, so we could simply maintain a 
> collections of rows and upsert into them as we process the execution. Most 
> transactions won’t need it, I suspect, so we don’t need to worry about 
> perfect efficiency.
>  
>  
> From: Blake Eggleston <beggles...@apple.com <mailto:beggles...@apple.com>>
> Date: Tuesday, 7 June 2022 at 00:21
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> That's a good question. I'd lean towards returning the final state of things, 
> although I could understand expecting to see intermediate state. Regarding 
> range tombstones, we could require them to precede any updates like selects, 
> but there's still the question of how to handle multiple updates to the same 
> cell when the user has requested we return the post-update state of the cell.
> 
> 
> 
> On Jun 6, 2022, at 4:00 PM, bened...@apache.org <mailto:bened...@apache.org> 
> wrote:
>  
> > if multiple updates end up touching the same cell, I’d expect the last one 
> > to win
>  
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing to 
> mix with inserts over the same key range.
>  
> What’s your present thinking about the idea of handling returning the values 
> as of a given point in the sequential execution then?
>  
> The succinct syntax is I think highly desirable for user experience, but this 
> does complicate it a bit if we want to remain intuitive.
>  
>  
>  
>  
> From: Blake Eggleston <beggles...@apple.com <mailto:beggles...@apple.com>>
> Date: Monday, 6 June 2022 at 23:17
> To: dev@cassandra.apache.org <mailto:dev@cassandra.apache.org> 
> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>>
> Subject: Re: CEP-15 multi key transaction syntax
> 
> Hi all,
> 
> Thanks for all the input and questions so far. Glad people are excited about 
> this!
> 
> I didn’t have any free time to respond this weekend, although it looks like 
> Benedict has responded to most of the questions so far, so if I don’t respond 
> to a question you asked here, you can interpret that as “what Benedict said” 
> :).
> 
> 
> Jeff, 
> 
> > Is there a new keyword for “partition (not) exists” or is it inferred by 
> > the select?
> 
> I'd intended this to be worked out from the select statement, ie: if the 
> read/reference is null/empty, then it doesn't exist, whether you're 
> interested in the partition, row, or cell. So I don't think we'd need an 
> additional keyword there. I think that would address partition exists / not 
> exists use cases?
> 
> > And would you allow a transaction that had > 1 named select and no 
> > modification statements, but commit if 1=1 ?
> 
> Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) 
> would be part of the syntax. Also, running a txn that doesn’t contain updates 
> wouldn’t be a problem.
> 
> Patrick, I think Benedict answered your questions? Glad you got the joke :)
> 
> Alex,
> 
> > 1. Dependant SELECTs
> > 2. Dependant UPDATEs
> > 3. UPDATE from secondary index (or SASI)
> > 5. UPDATE with predicate on non-primary key
> 
> The full primary key must be defined as part of the statement, and you can’t 
> use column references to define them, so you wouldn’t be able to run these.
> 
> > MVs
> 
> To prevent being spread too thin, both in syntax design and implementation 
> work, I’d like to limit read and write operations in the initial 
> implementation to vanilla selects, updates, inserts, and deletes. Once we 
> have a solid implementation of multi-key/table transactions supporting 
> foundational operations, we can start figuring out how the more advanced 
> pieces can be best supported. Not a great answer to your question, but a 
> related tangent I should have included in my initial email.
> 
> > ... RETURNING ...
> 
> I like the idea of the returning statement, but to echo what Benedict said, I 
> think any scheme for specifying data to be returned should apply the same to 
> select and update statements, since updates can have underlying reads that 
> the user may be interested in. I’d mentioned having an optional RETURN 
> statement in addition to automatically returning selects in my original email.
> 
> > ... WITH ...
> 
> I like the idea of defining statement names at the beginning of a statement, 
> since I could imagine mapping names to selects might get difficult if there 
> are a lot of columns in the select or update, but beginning each statement 
> with `WITH <name>` reduces readability imo. Maybe putting the name after the 
> first term of the statement (ie: `SELECT * AS <name> WHERE...`, `UPDATE table 
> AS <name> SET ...`, `INSERT INTO table AS <name> (...) VALUES (...);`) would 
> be improve finding names without harming overall readability?
> 
> Benedict,
> 
> > I agree that SELECT statements should be required to go first.
> 
> +1
> 
> > There only remains the issue of conditions imposed upon 
> > UPDATE/INSERT/DELETE statements when there are multiple statements that 
> > affect the same primary key. I think we can (and should) simply reject such 
> > queries for now, as it doesn’t make much sense to have multiple statements 
> > for the same primary key in the same transaction.
> 
> Unfortunately, I think there are use cases for both multiple selects and 
> updates for the same primary key in a txn. Selects aren’t as problematic, but 
> if multiple updates end up touching the same cell, I’d expect the last one to 
> win. This would make dealing with range tombstones a little trickier, since 
> the default behavior of alternating updates and range tombstones affecting 
> the same cells is not intuitive, but I don’t think it would be too bad.
> 
> 
> Something that’s come up a few times, and that I’ve also been thinking about 
> is whether to return the values that were originally read, or the values 
> written with the update to the client, and there are use cases for both. I 
> don’t remember who suggested it, but I think returning the original values 
> from named select statements, and the post-update values from named update 
> statements is a good way to handle both. Also, while returning the contents 
> of the mutation would be the easiest, implementation wise, swapping cell 
> values from the updates named read would be most useful, since a txn won’t 
> always result in an update, in which case we’d just return the select.
> 
> Thanks,
> 
> Blake
>  
>  
> 
> 
> 
> 
> On Jun 6, 2022, at 9:41 AM, Henrik Ingo <henrik.i...@datastax.com 
> <mailto:henrik.i...@datastax.com>> wrote:
>  
> On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
> <mailto:bened...@apache.org> <bened...@apache.org 
> <mailto:bened...@apache.org>> wrote:
> > One way to make it obvious is to require the user to explicitly type the 
> > SELECTs and then to require that all SELECTs appear before 
> > UPDATE/INSERT/DELETE.
>  
> Yes, I agree that SELECT statements should be required to go first.
>  
> However, I think this is sufficient and we can retain the shorter format for 
> RETURNING. There only remains the issue of conditions imposed upon 
> UPDATE/INSERT/DELETE statements when there are multiple statements that 
> affect the same primary key. I think we can (and should) simply reject such 
> queries for now, as it doesn’t make much sense to have multiple statements 
> for the same primary key in the same transaction.
>  
> I guess I was thinking ahead to a future where and UPDATE write set may or 
> may not intersect with a previous update due to allowing WHERE clause to use 
> secondary keys, etc.
>  
> That said, I'm not saying we SHOULD require explicit SELECT statements for 
> every update. I'm sure that would be annoying more than useful.I was just 
> following a train of thought.
>  
>  
>  
> > Returning the "result" from an UPDATE presents the question should it be 
> > the data at the start of the transaction or end state?
>  
> I am inclined to only return the new values (as proposed by Alex) for the 
> purpose of returning new auto-increment values etc. If you require the prior 
> value, SELECT is available to express this.
>  
> That's a great point!
>  
>  
> > I was thinking the following coordinator-side implementation would allow to 
> > use also old drivers
>  
> I am inclined to return just the first result set to old clients. I think 
> it’s fine to require a client upgrade to get multiple result sets.
>  
> Possibly. I just wanted to share an idea for consideration. IMO the temp 
> table idea might not be too hard to implement*, but sure the syntax does feel 
> a bit bolted on.
>  
> *) I'm maybe the wrong person to judge that, of course :-) 
>  
> henrik
>  
> -- 
> Henrik Ingo
> +358 40 569 7354 <tel:358405697354>

Re: CEP-15 multi key transaction syntax

Reply via email to