Re: CEP-15 multi key transaction syntax

Blake Eggleston Mon, 06 Jun 2022 15:17:20 -0700

Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).

Jeff, 

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH <name>` reduces readability imo. Maybe putting the name after the first 
term of the statement (ie: `SELECT * AS <name> WHERE...`, `UPDATE table AS 
<name> SET ...`, `INSERT INTO table AS <name> (...) VALUES (...);`) would be 
improve finding names without harming overall readability?

Benedict,

> I agree that SELECT statements should be required to go first.

+1

> There only remains the issue of conditions imposed upon UPDATE/INSERT/DELETE 
> statements when there are multiple statements that affect the same primary 
> key. I think we can (and should) simply reject such queries for now, as it 
> doesn’t make much sense to have multiple statements for the same primary key 
> in the same transaction.

Unfortunately, I think there are use cases for both multiple selects and 
updates for the same primary key in a txn. Selects aren’t as problematic, but 
if multiple updates end up touching the same cell, I’d expect the last one to 
win. This would make dealing with range tombstones a little trickier, since the 
default behavior of alternating updates and range tombstones affecting the same 
cells is not intuitive, but I don’t think it would be too bad.

Something that’s come up a few times, and that I’ve also been thinking about is 
whether to return the values that were originally read, or the values written 
with the update to the client, and there are use cases for both. I don’t 
remember who suggested it, but I think returning the original values from named 
select statements, and the post-update values from named update statements is a 
good way to handle both. Also, while returning the contents of the mutation 
would be the easiest, implementation wise, swapping cell values from the 
updates named read would be most useful, since a txn won’t always result in an 
update, in which case we’d just return the select.

Thanks,

Blake

> On Jun 6, 2022, at 9:41 AM, Henrik Ingo <henrik.i...@datastax.com> wrote:
> 
> On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
> <mailto:bened...@apache.org> <bened...@apache.org 
> <mailto:bened...@apache.org>> wrote:
> > One way to make it obvious is to require the user to explicitly type the 
> > SELECTs and then to require that all SELECTs appear before 
> > UPDATE/INSERT/DELETE.
> 
>  
> 
> Yes, I agree that SELECT statements should be required to go first.
> 
>  
> 
> However, I think this is sufficient and we can retain the shorter format for 
> RETURNING. There only remains the issue of conditions imposed upon 
> UPDATE/INSERT/DELETE statements when there are multiple statements that 
> affect the same primary key. I think we can (and should) simply reject such 
> queries for now, as it doesn’t make much sense to have multiple statements 
> for the same primary key in the same transaction.
> 
> 
> I guess I was thinking ahead to a future where and UPDATE write set may or 
> may not intersect with a previous update due to allowing WHERE clause to use 
> secondary keys, etc.
> 
> That said, I'm not saying we SHOULD require explicit SELECT statements for 
> every update. I'm sure that would be annoying more than useful.I was just 
> following a train of thought.
> 
>  
>  
> 
> > Returning the "result" from an UPDATE presents the question should it be 
> > the data at the start of the transaction or end state?
> 
>  
> 
> I am inclined to only return the new values (as proposed by Alex) for the 
> purpose of returning new auto-increment values etc. If you require the prior 
> value, SELECT is available to express this.
> 
> 
> That's a great point!
>  
>  
> 
> > I was thinking the following coordinator-side implementation would allow to 
> > use also old drivers
> 
>  
> 
> I am inclined to return just the first result set to old clients. I think 
> it’s fine to require a client upgrade to get multiple result sets.
> 
> 
> Possibly. I just wanted to share an idea for consideration. IMO the temp 
> table idea might not be too hard to implement*, but sure the syntax does feel 
> a bit bolted on.
> 
> *) I'm maybe the wrong person to judge that, of course :-) 
> 
> henrik
> 
> -- 
> Henrik Ingo
> +358 40 569 7354 <tel:358405697354>
>  <https://www.datastax.com/>   <https://twitter.com/DataStaxEng>   
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>    <https://www.linkedin.com/in/heingo/>

Re: CEP-15 multi key transaction syntax

Reply via email to