(or 3. Let schema updates break the statement – this might actually be 
preferable, so long as it fails-fast rather than corrupts behaviour)

From: bened...@apache.org <bened...@apache.org>
Date: Tuesday, 14 June 2022 at 20:58
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: CEP-15 multi key transaction syntax
It sounds like we’re zeroing in on a solution.

To draw attention back to Jon’s email, I think the last open question at this 
point is the scope of identifiers declared by LET, and how we handle name 
clashes with table columns in an UPDATE.

I think we have basically two options:

1. Require LET for all input parameters to an assignment in UPDATE
2. Add some additional syntax to local variables to identify them, e.g. 
<variable>

Any other ideas?



From: Derek Chen-Becker <de...@chen-becker.org>
Date: Tuesday, 14 June 2022 at 20:31
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: CEP-15 multi key transaction syntax
Sorry, that was in reference to the "Would you require a LIMIT 1 clause if the 
key did not fully specify a row?" question, so I think we're in agreement here.

Cheers,

Derek

On Tue, Jun 14, 2022 at 1:27 PM bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>> wrote:
> It seems like we would want to start with restrictions on number of rows, 
> uniqueness, homogeneity of results, etc

I am not keen on any hard limit on the number of rows, I anticipate a 
configurable guardrail for rejecting queries that are too expensive. I think 
the normal CQL restrictions are likely to apply (must include partition key), 
plus (initially) no range scans, and the aforementioned restrictions on what 
order statements must occur in the transaction.


From: Derek Chen-Becker <de...@chen-becker.org<mailto:de...@chen-becker.org>>
Date: Tuesday, 14 June 2022 at 18:42
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
"MIXED" means, "hey, this might not be my standard PGSQL transaction" :)

I do think that surprise is a meaningful measure, from the perspective of an 
individual developer coming to Cassandra from any arbitrary RDBMS. My own 
experience is that a non-trivial number of developers are essentially blindly 
following guidance given to them by someone else when it comes to features like 
transactions, so making syntax that looks superficially similar to SQL 
transactions but acts subtly different (or uses slightly different syntax) is 
going to be surprising. I think we get diminishing marginal returns on "it 
looks just like SQL!" when we start to venture further into territory where 
even different RDMBSs disagree. I would rather use some syntax that is clearly 
Cassandra-specific, even if the structure would be similar to a SQL 
transaction, just to ensure that developers understand that it's different and 
actually look at the docs.

I completely agree on focusing on clarity and consistency, and I think 
considering how we think it might evolve is good, but that can't be an 
open-ended exercise. My primary concern is how we can start getting incremental 
improvements into end users' hands more quickly, since the alternative right 
now is to basically roll your own, right?

Cheers,

Derek

On Mon, Jun 13, 2022 at 4:16 PM bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>> wrote:
What on earth does MIXED mean?

I agree with the sentiment we should minimise surprise, but everyone is 
surprised differently so it becomes a sort of pointless rubrik, everyone 
claiming it supports their view. I think it is only useful in cases where there 
is clear agreement that something is surprising, but unhelpful when choosing 
between subtle variations on approach.

The main goal IMO should be clarity and consistency, so that the user can 
reason about the constructs easily, and so we can evolve them.

For instance, we should be sure to consider how the syntax will look if we *do* 
offer interactive transactions, or JOINs, or anything else we might add in 
future.


From: Derek Chen-Becker <de...@chen-becker.org<mailto:de...@chen-becker.org>>
Date: Monday, 13 June 2022 at 23:09
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
<beggles...@apple.com<mailto:beggles...@apple.com>> wrote:
I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

+1, the principle of least surprise tells me that if this doesn't behave 
exactly like SQL transactions (for whatever SQL actually means), it could be 
more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek



On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
<beggles...@apple.com<mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.

On Jun 13, 2022, at 10:32 AM, Aaron Ploetz 
<aaronplo...@gmail.com<mailto:aaronplo...@gmail.com>> wrote:

Benedict,

I'm really excited about this feature.  I've been observing this conversation 
for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

I think taking small steps forward, to build a few complete features as close 
to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like 
approach... or do we want a more SQL-like approach

For years now we've been fighting this notion that Cassandra is difficult to 
use.  Coming up with specialized syntax isn't going to bridge that divide.  
From a (new?) user perspective, the best plan is to stay as consistent with SQL 
as possible.

I believe that is a MySQL specific concept. This is one problem with mimicking 
SQL – it’s not one thing!

Right?!?!  As if this needed to be more complex.

I think we have evidence that it is fine to interpret NULL as “false” for the 
evaluation of IF conditions.

Agree.  Null == false isn't too much of a leap.

Thanks for taking up the charge on this one.  Glad to see it moving forward!

Thanks,

Aaron



On Sun, Jun 12, 2022 at 10:33 AM 
bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>> wrote:
Welcome Li, and thanks for your input

> When I first saw the syntax, I took it for granted that the condition was 
> evaluated against the state AFTER the updates

Depending what you mean, I think this is one of the options being considered. 
At least, it seems this syntax is most likely to be evaluated against the 
values written by preceding statements in the batch, but not the statement 
itself (or later ones), as this could lead to nonsensical statements like

BEGIN TRANSACTION
UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
COMMIT TRANSACTION IF tbl.v = 0

Where y is never 0 afterwards, so this never succeeds. I take it in this simple 
case you would expect the condition to be evaluated against the state prior to 
the statement (i.e. the initial state)?

But we have a blank slate, so every option is available to us! We just need to 
make sure it makes sense to the user, even in uncommon cases.

> The IF (Boolean expr) ABORT TRANSACTION would suffer less because users may 
> tend to put the condition closer to the related SELECT statement.

This is probably not going to matter in practice. The SELECTs all happen 
upfront no matter what the CQL might look like, and the UPDATE all happen only 
after the IF conditions are evaluated. This is all just a question of how the 
user expresses things.

In future we may offer interactive transactions, or transactions that are 
multi-step, in which case this would be more relevant and could have an 
efficiency impact.

> Would you consider allowing users to start a read-only transaction explicitly 
> like BEGIN TRANSACTION READONLY?

Good question. I would be OK with this, for sure, and will defer to the 
opinions of others here. There won’t be any optimisation impact, as we simply 
check if the transaction contains any updates, but some validation could be 
helpful for the user.

> Finally, I wonder if the community would be interested in idempotency support.

This is something that has been considered, and that Accord is able to support 
(in a couple of ways), but as an end-to-end feature this requires client 
support and other scaffolding that is not currently planned/scheduled. The 
simplest (least robust) approach is for the server to include the transaction’s 
identifier in its timeout, so that it be queried by the client to establish if 
it has been made durable. This should be quite easy to deliver on the 
server-side, but would require some application or client integration, and is 
unreliable in the face of coordinator failure (so the transaction id is unknown 
to the client). The more complete approach is for the client to include an 
idempotency token in its submission to the server, and for C* to record this 
alongside the transaction id, and for some bounded time window to either reject 
re-submissions of this token or to evaluate it as a no-op. This requires much 
tighter integration from the clients, and more work server-side.

Which is simply to say, this is on our radar but I can’t make promises about 
what form it will take, or when it will arrive, only that it has been planned 
for enough to ensure we can achieve it when resources permit.

From: Li Boxuan <libox...@connect.hku.hk<mailto:libox...@connect.hku.hk>>
Date: Sunday, 12 June 2022 at 16:14
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Correcting my typo:

>  I took it for granted that the condition was evaluated against the state 
> before the updates

I took it for granted that the condition was evaluated against the state AFTER 
the updates

On Jun 12, 2022, at 11:07 AM, Li Boxuan 
<libox...@connect.hku.hk<mailto:libox...@connect.hku.hk>> wrote:

Thank you team for this exciting update! I just joined the dev mailing list to 
take part in this discussion. I am not a Cassandra developer and haven’t 
understood Accord myself yet, so my questions are more from a user’s standpoint.

> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
> get one chance to make this API right.

I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I took 
it for granted that the condition was evaluated against the state after the 
updates, but it turned out to be the opposite. Thus I prefer the IF (Boolean 
expr) ABORT TRANSACTION idea. In addition, when the transaction is large and 
there are many conditions, using the COMMIT IF syntax might make the CQL query 
uglier and developers’ life harder. Another very subtle point is if there are 
many conditions combined using AND clauses, wouldn't it make the execution 
slightly slower because, for each SELECT statement, you would have to check 
every condition? The IF (Boolean expr) ABORT TRANSACTION would suffer less 
because users may tend to put the condition closer to the related SELECT 
statement.

> read-only transactions involving multiple tables will definitely be supported.

Would you consider allowing users to start a read-only transaction explicitly 
like BEGIN TRANSACTION READONLY? This could help catch some developers’ bugs 
like unintentional updates. This might also give Cassandra a hint for 
optimization.

Finally, I wonder if the community would be interested in idempotency support. 
DynamoDB has this interesting feature 
(https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems),
 which guards the situation where the same transaction is submitted multiple 
times due to a connection time-out or other connectivity issue. I have no idea 
how that is implemented under the hood and I don’t even know if this is 
technically possible with the Accord design, but I thought it would be 
interesting to think about.

Best regards,
Boxuan


On Jun 12, 2022, at 7:31 AM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
    ROLLBACK
    RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how do we modify it 
to make non-interactive transactions convenient, and implementation tractable)

* This is anyway a shortcoming of existing batches, I think? So it might be we 
can sweep it under the rug, but I think it will be more relevant here as people 
execute more complex transactions, and we should ideally have semantics that 
will work well into the future – including if we later introduce interactive 
transactions.





From: Patrick McFadin <pmcfa...@gmail.com<mailto:pmcfa...@gmail.com>>
Date: Saturday, 11 June 2022 at 15:33
To: dev <dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
I think the syntax is evolving into something pretty complicated, which may be 
warranted but I wanted to take a step back and be a bit more reflective on what 
we are trying to accomplish.

For context, my questions earlier were based on my 20+ years of using SQL 
transactions across different systems. That's my personal bias when I see the 
word "database transaction" in this case. When you start a SQL transaction, you 
are creating a branch of your data that you can operate with until you reach 
your desired state and then merge it back with a commit. Or if you don't like 
what you see, use a rollback and act like it never happened. That was the 
thinking when I asked about interactive sessions. If you are using a driver, 
that all happens in a batch. I realize that is out of scope here, but that's 
probably knowledge that is pre-installed in the majority of the user community.

Getting to the point, which is developer experience. I'm seeing a philosophical 
fork in the road which hopefully will generate some comments in the larger user 
community.

Path 1)
Mimic what's already been available in the SQL community, using existing CQL 
syntax. (SQL Example using JDBC: https://www.baeldung.com/java-jdbc-auto-commit)

Path 2)
Chart a new direction with new syntax

I genuinely don't have a clear answer, but I would love hearing from people on 
what they think.

Patrick

On Fri, Jun 10, 2022 at 12:07 PM 
bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>> wrote:
This might also permit us to remove one result set (the success/failure one) 
and return instead an exception if the transaction is aborted. This is also 
more consistent with SQL, if memory serves. That might conflict with returning 
the other result sets in the event of abort (though that’s up to us 
ultimately), but it feels like a nicer API for the user – depending on how 
these exceptions are surfaced in client APIs.

From: bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>>
Date: Friday, 10 June 2022 at 19:59
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
So, thinking on it myself some more, I think if there’s an option that 
*doesn’t* require the user to reason about the point at which the read happens 
in order to understand how the condition is applied would probably be better.

What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?

It’s compatible with more advanced IF functionality later, and probably not 
much trickier to implement?

The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we only 
get one chance to make this API right.


From: Blake Eggleston <beggles...@apple.com<mailto:beggles...@apple.com>>
Date: Friday, 10 June 2022 at 18:56
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Yeah I think that’s intuitive enough. I had been thinking about multiple 
condition branches, but was thinking about something closer to

IF select.column=5
  UPDATE ... SET ... WHERE key=1;
ELSE IF select.column=6
  UPDATE ... SET ... WHERE key=2;
ELSE
  UPDATE ... SET ... WHERE key=3;
ENDIF
COMMIT TRANSACTION;

Which would make the proposed COMMIT IF we're talking about now a shorthand. Of 
course this would be follow on work.

On Jun 8, 2022, at 1:20 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

I imagine that conditions would be evaluated against the state prior to the 
execution of statement against which it is being evaluated, but after the prior 
statements. I think that should be OK to reason about.

i.e. we might have a contrived example like:

BEGIN TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1

So q1 would read a = 0, but q2 would read a = 1 and set a = 2.

I think this is probably adequately intuitive? It is a bit atypical to have 
conditions that wrap the whole transaction though.

We have another option, of course, which is to offer IF x ROLLBACK TRANSACTION, 
which is closer to SQL, which would translate the above to:

BEGIN TRANSACTION
SELECT a FROM tbl WHERE k = 1 AS q0
IF q0.a != 0 ROLLBACK TRANSACTION
UPDATE tbl SET a = 1 WHERE k = 1 AS q1
IF q1.a != 1 ROLLBACK TRANSACTION
UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
COMMIT TRANSACTION

This is less succinct, but might be more familiar to users. We could also 
eschew the ability to read from UPDATE statements entirely in this scheme, as 
this would then look very much like SQL.


From: Blake Eggleston <beggles...@apple.com<mailto:beggles...@apple.com>>
Date: Wednesday, 8 June 2022 at 20:59
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
> It affects not just RETURNING but also conditions that are evaluated against 
> the row, and if we in future permit using the values from one select in a 
> function call / write to another table (which I imagine we will).

I hadn’t thought about that... using intermediate or even post update values in 
condition evaluation or function calls seems like it would make it difficult to 
understand why a condition is or is not applying. On the other hand, it would 
powerful, especially when using things like database generated values in 
queries (auto incrementing integer clustering keys or server generated 
timeuuids being examples that come to mind). Additionally, if we return these 
values, I guess that would solve the visibility issues I’m worried about.

Agreed intermediate values would be straightforward to calculate though.

On Jun 6, 2022, at 4:33 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

It affects not just RETURNING but also conditions that are evaluated against 
the row, and if we in future permit using the values from one select in a 
function call / write to another table (which I imagine we will).

I think that for it to be intuitive we need it to make sense sequentially, 
which means either calculating it or restricting what can be stated (or 
abandoning the syntax).

If we initially forbade multiple UPDATE/INSERT to the same key, but permitted 
overlapping DELETE (and as many SELECT as you like) that would perhaps make it 
simple enough? Require for now that SELECTS go first, then DELETE and then 
INSERT/UPDATE (or vice versa, depending what we want to make simple)?

FWIW, I don’t think this is terribly onerous to calculate either, since it’s 
restricted to single rows we are updating, so we could simply maintain a 
collections of rows and upsert into them as we process the execution. Most 
transactions won’t need it, I suspect, so we don’t need to worry about perfect 
efficiency.


From: Blake Eggleston <beggles...@apple.com<mailto:beggles...@apple.com>>
Date: Tuesday, 7 June 2022 at 00:21
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
That's a good question. I'd lean towards returning the final state of things, 
although I could understand expecting to see intermediate state. Regarding 
range tombstones, we could require them to precede any updates like selects, 
but there's still the question of how to handle multiple updates to the same 
cell when the user has requested we return the post-update state of the cell.



On Jun 6, 2022, at 4:00 PM, bened...@apache.org<mailto:bened...@apache.org> 
wrote:

> if multiple updates end up touching the same cell, I’d expect the last one to 
> win

Hmm, yes I suppose range tombstones are a plausible and reasonable thing to mix 
with inserts over the same key range.

What’s your present thinking about the idea of handling returning the values as 
of a given point in the sequential execution then?

The succinct syntax is I think highly desirable for user experience, but this 
does complicate it a bit if we want to remain intuitive.




From: Blake Eggleston <beggles...@apple.com<mailto:beggles...@apple.com>>
Date: Monday, 6 June 2022 at 23:17
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> 
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: CEP-15 multi key transaction syntax
Hi all,

Thanks for all the input and questions so far. Glad people are excited about 
this!

I didn’t have any free time to respond this weekend, although it looks like 
Benedict has responded to most of the questions so far, so if I don’t respond 
to a question you asked here, you can interpret that as “what Benedict said” :).


Jeff,

> Is there a new keyword for “partition (not) exists” or is it inferred by the 
> select?

I'd intended this to be worked out from the select statement, ie: if the 
read/reference is null/empty, then it doesn't exist, whether you're interested 
in the partition, row, or cell. So I don't think we'd need an additional 
keyword there. I think that would address partition exists / not exists use 
cases?

> And would you allow a transaction that had > 1 named select and no 
> modification statements, but commit if 1=1 ?

Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF) would 
be part of the syntax. Also, running a txn that doesn’t contain updates 
wouldn’t be a problem.

Patrick, I think Benedict answered your questions? Glad you got the joke :)

Alex,

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key

The full primary key must be defined as part of the statement, and you can’t 
use column references to define them, so you wouldn’t be able to run these.

> MVs

To prevent being spread too thin, both in syntax design and implementation 
work, I’d like to limit read and write operations in the initial implementation 
to vanilla selects, updates, inserts, and deletes. Once we have a solid 
implementation of multi-key/table transactions supporting foundational 
operations, we can start figuring out how the more advanced pieces can be best 
supported. Not a great answer to your question, but a related tangent I should 
have included in my initial email.

> ... RETURNING ...

I like the idea of the returning statement, but to echo what Benedict said, I 
think any scheme for specifying data to be returned should apply the same to 
select and update statements, since updates can have underlying reads that the 
user may be interested in. I’d mentioned having an optional RETURN statement in 
addition to automatically returning selects in my original email.

> ... WITH ...

I like the idea of defining statement names at the beginning of a statement, 
since I could imagine mapping names to selects might get difficult if there are 
a lot of columns in the select or update, but beginning each statement with 
`WITH <name>` reduces readability imo. Maybe putting the name after the first 
term of the statement (ie: `SELECT * AS <name> WHERE...`, `UPDATE table AS 
<name> SET ...`, `INSERT INTO table AS <name> (...) VALUES (...);`) would be 
improve finding names without harming overall readability?

Benedict,

> I agree that SELECT statements should be required to go first.

+1

> There only remains the issue of conditions imposed upon UPDATE/INSERT/DELETE 
> statements when there are multiple statements that affect the same primary 
> key. I think we can (and should) simply reject such queries for now, as it 
> doesn’t make much sense to have multiple statements for the same primary key 
> in the same transaction.

Unfortunately, I think there are use cases for both multiple selects and 
updates for the same primary key in a txn. Selects aren’t as problematic, but 
if multiple updates end up touching the same cell, I’d expect the last one to 
win. This would make dealing with range tombstones a little trickier, since the 
default behavior of alternating updates and range tombstones affecting the same 
cells is not intuitive, but I don’t think it would be too bad.


Something that’s come up a few times, and that I’ve also been thinking about is 
whether to return the values that were originally read, or the values written 
with the update to the client, and there are use cases for both. I don’t 
remember who suggested it, but I think returning the original values from named 
select statements, and the post-update values from named update statements is a 
good way to handle both. Also, while returning the contents of the mutation 
would be the easiest, implementation wise, swapping cell values from the 
updates named read would be most useful, since a txn won’t always result in an 
update, in which case we’d just return the select.

Thanks,

Blake






On Jun 6, 2022, at 9:41 AM, Henrik Ingo 
<henrik.i...@datastax.com<mailto:henrik.i...@datastax.com>> wrote:

On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>> wrote:
> One way to make it obvious is to require the user to explicitly type the 
> SELECTs and then to require that all SELECTs appear before 
> UPDATE/INSERT/DELETE.

Yes, I agree that SELECT statements should be required to go first.

However, I think this is sufficient and we can retain the shorter format for 
RETURNING. There only remains the issue of conditions imposed upon 
UPDATE/INSERT/DELETE statements when there are multiple statements that affect 
the same primary key. I think we can (and should) simply reject such queries 
for now, as it doesn’t make much sense to have multiple statements for the same 
primary key in the same transaction.

I guess I was thinking ahead to a future where and UPDATE write set may or may 
not intersect with a previous update due to allowing WHERE clause to use 
secondary keys, etc.

That said, I'm not saying we SHOULD require explicit SELECT statements for 
every update. I'm sure that would be annoying more than useful.I was just 
following a train of thought.



> Returning the "result" from an UPDATE presents the question should it be the 
> data at the start of the transaction or end state?

I am inclined to only return the new values (as proposed by Alex) for the 
purpose of returning new auto-increment values etc. If you require the prior 
value, SELECT is available to express this.

That's a great point!


> I was thinking the following coordinator-side implementation would allow to 
> use also old drivers

I am inclined to return just the first result set to old clients. I think it’s 
fine to require a client upgrade to get multiple result sets.

Possibly. I just wanted to share an idea for consideration. IMO the temp table 
idea might not be too hard to implement*, but sure the syntax does feel a bit 
bolted on.

*) I'm maybe the wrong person to judge that, of course :-)

henrik

--
Henrik Ingo
+358 40 569 7354<tel:358405697354>






--
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+



--
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+



--
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Reply via email to