Re: CEP-15 multi key transaction syntax

Derek Chen-Becker Tue, 14 Jun 2022 13:50:58 -0700

OK, that makes sense. One of the examples in an earlier email had duplicate
"LET miles =" so I was confused. I think failing in the face of ambiguous
identifiers is going to be more friendly by not requiring LET for every
field you might want to use, and we can provide a very clear error message
in that case.


Cheers,

Derek

On Tue, Jun 14, 2022 at 2:40 PM [email protected] <[email protected]>
wrote:

> To be clear, the concerning situation is
>
>
>
> BEGIN TRANSACTION
>
>   LET miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   IF running THEN
>
>     UPDATE cars SET miles_driven = miles + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
> But where there’s some additional column also called *miles* in *cars*
>
>
>
>
>
> *From: *[email protected] <[email protected]>
> *Date: *Tuesday, 14 June 2022 at 21:37
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Duplicate declarations are usually rejected by languages, so I think
> that’s fine?
>
>
>
> Option 1 would involve something like
>
>
>
> BEGIN TRANSACTION
>
>   LET car_miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>
>   SELECT running, car_miles, user_miles
>
>   IF running THEN
>
>     UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>     UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
>
>
>
>
> *From: *Derek Chen-Becker <[email protected]>
> *Date: *Tuesday, 14 June 2022 at 21:27
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Just to make sure I'm understanding correctly, I've been thinking of LET
> like a variable declaration and assignment, but is that the right mental
> model? For example, this is a valid statement:
>
>
>
> BEGIN TRANSACTION
>
>   LET miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   SELECT running, miles   # let the user know if the transaction takes any
> action
>
>   IF running THEN
>
>     UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
>     UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
> But this isn't, because we're trying to bind to "miles" twice
>
>
>
> BEGIN TRANSACTION
>
>   LET miles = miles_driven, running=is_running FROM cars WHERE
> model=’pinto’
>
>   LET miles = miles_driven FROM users WHERE name=’blake’ # duplicate
> binding for "miles"
>
>   SELECT running, miles   # let the user know if the transaction takes any
> action
>
>   IF running THEN
>
>     UPDATE users SET miles_driven = miles_driven + 30 WHERE name='blake';
>     UPDATE cars SET miles_driven = miles_driven + 30 WHERE model='pinto';
>
>   END IF
>
> COMMIT TRANSACTION
>
>
>
> I think that's option #1, but I'm a little confused now that I'm looking
> at some of the examples.
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Tue, Jun 14, 2022 at 1:58 PM [email protected] <[email protected]>
> wrote:
>
> It sounds like we’re zeroing in on a solution.
>
>
>
> To draw attention back to Jon’s email, I think the last open question at
> this point is the scope of identifiers declared by LET, and how we handle
> name clashes with table columns in an UPDATE.
>
>
>
> I think we have basically two options:
>
>
>
> 1. Require LET for all input parameters to an assignment in UPDATE
>
> 2. Add some additional syntax to local variables to identify them, e.g.
> <variable>
>
>
>
> Any other ideas?
>
>
>
>
>
>
>
> *From: *Derek Chen-Becker <[email protected]>
> *Date: *Tuesday, 14 June 2022 at 20:31
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Sorry, that was in reference to the "Would you require a LIMIT 1 clause if
> the key did not fully specify a row?" question, so I think we're in
> agreement here.
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Tue, Jun 14, 2022 at 1:27 PM [email protected] <[email protected]>
> wrote:
>
> > It seems like we would want to start with restrictions on number of
> rows, uniqueness, homogeneity of results, etc
>
>
>
> I am not keen on any hard limit on the number of rows, I anticipate a
> configurable guardrail for rejecting queries that are too expensive. I
> think the normal CQL restrictions are likely to apply (must include
> partition key), plus (initially) no range scans, and the aforementioned
> restrictions on what order statements must occur in the transaction.
>
>
>
>
>
> *From: *Derek Chen-Becker <[email protected]>
> *Date: *Tuesday, 14 June 2022 at 18:42
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> "MIXED" means, "hey, this might not be my standard PGSQL transaction" :)
>
>
>
> I do think that surprise is a meaningful measure, from the perspective of
> an individual developer coming to Cassandra from any arbitrary RDBMS. My
> own experience is that a non-trivial number of developers are essentially
> blindly following guidance given to them by someone else when it comes to
> features like transactions, so making syntax that looks superficially
> similar to SQL transactions but acts subtly different (or uses slightly
> different syntax) is going to be surprising. I think we get diminishing
> marginal returns on "it looks just like SQL!" when we start to venture
> further into territory where even different RDMBSs disagree. I would rather
> use some syntax that is clearly Cassandra-specific, even if the structure
> would be similar to a SQL transaction, just to ensure that developers
> understand that it's different and actually look at the docs.
>
>
>
> I completely agree on focusing on clarity and consistency, and I think
> considering how we think it might evolve is good, but that can't be an
> open-ended exercise. My primary concern is how we can start getting
> incremental improvements into end users' hands more quickly, since the
> alternative right now is to basically roll your own, right?
>
>
>
> Cheers,
>
>
>
> Derek
>
>
>
> On Mon, Jun 13, 2022 at 4:16 PM [email protected] <[email protected]>
> wrote:
>
> What on earth does MIXED mean?
>
>
>
> I agree with the sentiment we should minimise surprise, but everyone is
> surprised differently so it becomes a sort of pointless rubrik, everyone
> claiming it supports their view. I think it is only useful in cases where
> there is clear agreement that something is surprising, but unhelpful when
> choosing between subtle variations on approach.
>
>
>
> The main goal IMO should be clarity and consistency, so that the user can
> reason about the constructs easily, and so we can evolve them.
>
>
>
> For instance, we should be sure to consider how the syntax will look if we
> **do** offer interactive transactions, or JOINs, or anything else we
> might add in future.
>
>
>
>
>
> *From: *Derek Chen-Becker <[email protected]>
> *Date: *Monday, 13 June 2022 at 23:09
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston <[email protected]>
> wrote:
>
> I prefer an approach that supports an accurate mental model of what’s
> happening behind the scenes. I think that should be a design priority for
> the syntax. We’ll be able to build things on top of accord, but the core
> multi-key cas operation isn’t going to change too much.
>
>
>
> +1, the principle of least surprise tells me that if this doesn't behave
> exactly like SQL transactions (for whatever SQL actually means), it could
> be more clear to not try and emulate it halfway
>
>
>
> BEGIN MIXED TRANSACTION?
>
>
>
> Derek
>
>
>
>
>
>
>
> On Jun 13, 2022, at 12:14 PM, Blake Eggleston <[email protected]>
> wrote:
>
>
>
> Does the IF <...> ABORT simplify reasoning though? If you restrict it to
> only dealing with the most recent row it would, but referencing the name
> implies you’d be able to include references from other operations, in which
> case you’d have the same problem.
>
> > return instead an exception if the transaction is aborted
>
> Since the txn is not actually interactive, I think it would be better to
> receive values instead of an excetion, to understand why the operation was
> rolled back.
>
>
>
> On Jun 13, 2022, at 10:32 AM, Aaron Ploetz <[email protected]> wrote:
>
>
>
> Benedict,
>
>
>
> I'm really excited about this feature.  I've been observing this
> conversation for a while now, and I"m happy to add some thoughts.
>
>
>
> We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.
>
>
>
> I think taking small steps forward, to build a few complete features as
> close to SQL as possible is a good approach.
>
>
>
> question we are currently asking: do we want to have a more LWT-like
> approach... or do we want a more SQL-like approach
>
>
>
> For years now we've been fighting this notion that Cassandra is difficult
> to use.  Coming up with specialized syntax isn't going to bridge that
> divide.  From a (new?) user perspective, the best plan is to stay as
> consistent with SQL as possible.
>
>
>
> I believe that is a MySQL specific concept. This is one problem with
> mimicking SQL – it’s not one thing!
>
>
>
> Right?!?!  As if this needed to be more complex.
>
>
>
> I think we have evidence that it is fine to interpret NULL as “false” for
> the evaluation of IF conditions.
>
>
>
> Agree.  Null == false isn't too much of a leap.
>
>
>
> Thanks for taking up the charge on this one.  Glad to see it moving
> forward!
>
>
>
> Thanks,
>
>
>
> Aaron
>
>
>
>
>
>
>
> On Sun, Jun 12, 2022 at 10:33 AM [email protected] <[email protected]>
> wrote:
>
> Welcome Li, and thanks for your input
>
>
>
> > When I first saw the syntax, I took it for granted that the condition
> was evaluated against the state AFTER the updates
>
>
>
> Depending what you mean, I think this is one of the options being
> considered. At least, it seems this syntax is most likely to be evaluated
> against the values written by preceding statements in the batch, but not
> the statement itself (or later ones), as this could lead to nonsensical
> statements like
>
>
>
> BEGIN TRANSACTION
>
> UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
>
> COMMIT TRANSACTION IF tbl.v = 0
>
>
>
> Where y is never 0 afterwards, so this never succeeds. I take it in this
> simple case you would expect the condition to be evaluated against the
> state prior to the statement (i.e. the initial state)?
>
>
>
> But we have a blank slate, so every option is available to us! We just
> need to make sure it makes sense to the user, even in uncommon cases.
>
>
>
> > The IF (Boolean expr) ABORT TRANSACTION would suffer less because users
> may tend to put the condition closer to the related SELECT statement.
>
>
>
> This is probably not going to matter in practice. The SELECTs all happen
> upfront no matter what the CQL might look like, and the UPDATE all happen
> only after the IF conditions are evaluated. This is all just a question of
> how the user expresses things.
>
>
>
> In future we may offer interactive transactions, or transactions that are
> multi-step, in which case this would be more relevant and could have an
> efficiency impact.
>
>
>
> > Would you consider allowing users to start a read-only transaction
> explicitly like BEGIN TRANSACTION READONLY?
>
>
>
> Good question. I would be OK with this, for sure, and will defer to the
> opinions of others here. There won’t be any optimisation impact, as we
> simply check if the transaction contains any updates, but some validation
> could be helpful for the user.
>
>
>
> > Finally, I wonder if the community would be interested in idempotency
> support.
>
>
>
> This is something that has been considered, and that Accord is able to
> support (in a couple of ways), but as an end-to-end feature this requires
> client support and other scaffolding that is not currently
> planned/scheduled. The simplest (least robust) approach is for the server
> to include the transaction’s identifier in its timeout, so that it be
> queried by the client to establish if it has been made durable. This should
> be quite easy to deliver on the server-side, but would require some
> application or client integration, and is unreliable in the face of
> coordinator failure (so the transaction id is unknown to the client). The
> more complete approach is for the client to include an idempotency token in
> its submission to the server, and for C* to record this alongside the
> transaction id, and for some bounded time window to either reject
> re-submissions of this token or to evaluate it as a no-op. This requires
> much tighter integration from the clients, and more work server-side.
>
>
>
> Which is simply to say, this is on our radar but I can’t make promises
> about what form it will take, or when it will arrive, only that it has been
> planned for enough to ensure we can achieve it when resources permit.
>
>
>
> *From: *Li Boxuan <[email protected]>
> *Date: *Sunday, 12 June 2022 at 16:14
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Correcting my typo:
>
>
>
> >  I took it for granted that the condition was evaluated against the
> state before the updates
>
>
>
> I took it for granted that the condition was evaluated against the state
> AFTER the updates
>
>
>
> On Jun 12, 2022, at 11:07 AM, Li Boxuan <[email protected]> wrote:
>
>
>
> Thank you team for this exciting update! I just joined the dev mailing
> list to take part in this discussion. I am not a Cassandra developer and
> haven’t understood Accord myself yet, so my questions are more from a
> user’s standpoint.
>
>
>
> > The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we
> only get one chance to make this API right.
>
>
>
> I agree that COMMIT IF syntax is ambiguous. When I first saw the syntax, I
> took it for granted that the condition was evaluated against the state
> after the updates, but it turned out to be the opposite. Thus I prefer the
> IF (Boolean expr) ABORT TRANSACTION idea. In addition, when the transaction
> is large and there are many conditions, using the COMMIT IF syntax might
> make the CQL query uglier and developers’ life harder. Another very subtle
> point is if there are many conditions combined using AND clauses, wouldn't
> it make the execution slightly slower because, for each SELECT statement,
> you would have to check every condition? The IF (Boolean expr) ABORT
> TRANSACTION would suffer less because users may tend to put the condition
> closer to the related SELECT statement.
>
>
>
> > read-only transactions involving multiple tables will definitely be
> supported.
>
>
>
> Would you consider allowing users to start a read-only transaction
> explicitly like BEGIN TRANSACTION READONLY? This could help catch some
> developers’ bugs like unintentional updates. This might also give Cassandra
> a hint for optimization.
>
>
>
> Finally, I wonder if the community would be interested in idempotency
> support. DynamoDB has this interesting feature (
> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html#transaction-apis-txwriteitems),
> which guards the situation where the same transaction is submitted multiple
> times due to a connection time-out or other connectivity issue. I have no
> idea how that is implemented under the hood and I don’t even know if this
> is technically possible with the Accord design, but I thought it would be
> interesting to think about.
>
>
>
> Best regards,
>
> Boxuan
>
>
>
>
>
> On Jun 12, 2022, at 7:31 AM, [email protected] wrote:
>
>
>
> > I would love hearing from people on what they think.
>
>
>
> ^^ It would be great to have more participants in this conversation
>
>
>
> > For context, my questions earlier were based on my 20+ years of using
> SQL transactions across different systems.
>
>
>
> We probably don’t come from a very different place. I spent too many years
> with T-SQL.
>
>
>
> > When you start a SQL transaction, you are creating a branch of your data
> that you can operate with until you reach your desired state and then merge
> it back with a commit.
>
>
>
> That’s the essential complexity we’re grappling with: how much do we
> permit your “branch” to do, how do we let you express it, and how do we let
> you express conditions?
>
>
>
> We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.
>
>
>
> Right now, we have the issue that read-your-writes introduces some
> complexity to the semantics, particularly around the conditions of
> execution.
>
>
>
> LWTs impose conditions on the state of all records prior to execution, but
> their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean
> expr) is most consistent with this approach. This can be confusing, though,
> if the condition is evaluated on a value that has been updated by a prior
> statement in the batch – what value does this global condition get
> evaluated against?*
>
>
>
> SQL has no such concept, but also SQL is designed to be interactive.
> Depending on the dialect there’s probably a lot of ways to do this
> non-interactively in SQL, but we probably cannot cheaply replicate the
> functionality exactly as we do not (yet) support interactive transactions
> that they were designed for. To submit a conditional non-interactive
> transaction in SQL, you would likely use
>
>
>
> IF (X) THEN
>
>     ROLLBACK
>
>     RETURN (ERRCODE)
>
> END IF
>
>
>
> or
>
>
>
> IF (X) THEN RAISERROR
>
>
>
> So, that is in essence the question we are currently asking: do we want to
> have a more LWT-like approach (and if so, how do we address this complexity
> for the user), or do we want a more SQL-like approach (and if so, how do we
> modify it to make non-interactive transactions convenient, and
> implementation tractable)
>
>
>
> * This is anyway a shortcoming of existing batches, I think? So it might
> be we can sweep it under the rug, but I think it will be more relevant here
> as people execute more complex transactions, and we should ideally have
> semantics that will work well into the future – including if we later
> introduce interactive transactions.
>
>
>
>
>
>
>
>
>
>
>
> *From: *Patrick McFadin <[email protected]>
> *Date: *Saturday, 11 June 2022 at 15:33
> *To: *dev <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> I think the syntax is evolving into something pretty complicated, which
> may be warranted but I wanted to take a step back and be a bit more
> reflective on what we are trying to accomplish.
>
>
>
> For context, my questions earlier were based on my 20+ years of using SQL
> transactions across different systems. That's my personal bias when I see
> the word "database transaction" in this case. When you start a SQL
> transaction, you are creating a branch of your data that you can operate
> with until you reach your desired state and then merge it back with a
> commit. Or if you don't like what you see, use a rollback and act like it
> never happened. That was the thinking when I asked about interactive
> sessions. If you are using a driver, that all happens in a batch. I realize
> that is out of scope here, but that's probably knowledge that is
> pre-installed in the majority of the user community.
>
>
>
> Getting to the point, which is developer experience. I'm seeing a
> philosophical fork in the road which hopefully will generate some comments
> in the larger user community.
>
>
>
> Path 1)
>
> Mimic what's already been available in the SQL community, using existing
> CQL syntax. (SQL Example using JDBC:
> https://www.baeldung.com/java-jdbc-auto-commit)
>
>
>
> Path 2)
>
> Chart a new direction with new syntax
>
>
>
> I genuinely don't have a clear answer, but I would love hearing from
> people on what they think.
>
>
>
> Patrick
>
>
>
> On Fri, Jun 10, 2022 at 12:07 PM [email protected] <[email protected]>
> wrote:
>
> This might also permit us to remove one result set (the success/failure
> one) and return instead an exception if the transaction is aborted. This is
> also more consistent with SQL, if memory serves. That might conflict with
> returning the other result sets in the event of abort (though that’s up to
> us ultimately), but it feels like a nicer API for the user – depending on
> how these exceptions are surfaced in client APIs.
>
>
>
> *From: *[email protected] <[email protected]>
> *Date: *Friday, 10 June 2022 at 19:59
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> So, thinking on it myself some more, I think if there’s an option that *
> *doesn’t** require the user to reason about the point at which the read
> happens in order to understand how the condition is applied would probably
> be better.
>
>
>
> What do you think of the IF (Boolean expr) ABORT TRANSACTION idea?
>
>
>
> It’s compatible with more advanced IF functionality later, and probably
> not much trickier to implement?
>
>
>
> The COMMIT IF syntax is more succinct, but ambiguity isn’t ideal and we
> only get one chance to make this API right.
>
>
>
>
>
> *From: *Blake Eggleston <[email protected]>
> *Date: *Friday, 10 June 2022 at 18:56
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Yeah I think that’s intuitive enough. I had been thinking about multiple
> condition branches, but was thinking about something closer to
>
> IF select.column=5
>   UPDATE ... SET ... WHERE key=1;
> ELSE IF select.column=6
>   UPDATE ... SET ... WHERE key=2;
> ELSE
>   UPDATE ... SET ... WHERE key=3;
> ENDIF
> COMMIT TRANSACTION;
>
> Which would make the proposed COMMIT IF we're talking about now a
> shorthand. Of course this would be follow on work.
>
>
>
> On Jun 8, 2022, at 1:20 PM, [email protected] wrote:
>
>
>
> I imagine that conditions would be evaluated against the state prior to
> the execution of statement against which it is being evaluated, but after
> the prior statements. I *think* that should be OK to reason about.
>
>
>
> i.e. we might have a contrived example like:
>
>
>
> BEGIN TRANSACTION
>
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
>
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
>
> COMMIT TRANSACTION IF q1.a = 0 AND q2.a = 1
>
>
>
> So q1 would read a = 0, but q2 would read a = 1 and set a = 2.
>
>
>
> I think this is probably adequately intuitive? It is a bit atypical to
> have conditions that wrap the whole transaction though.
>
>
>
> We have another option, of course, which is to offer IF x ROLLBACK
> TRANSACTION, which is closer to SQL, which would translate the above to:
>
>
>
> BEGIN TRANSACTION
>
> SELECT a FROM tbl WHERE k = 1 AS q0
>
> IF q0.a != 0 ROLLBACK TRANSACTION
>
> UPDATE tbl SET a = 1 WHERE k = 1 AS q1
>
> IF q1.a != 1 ROLLBACK TRANSACTION
>
> UPDATE tbl SET a = q1.a + 1 WHERE k = 1 AS q2
>
> COMMIT TRANSACTION
>
>
>
> This is less succinct, but might be more familiar to users. We could also
> eschew the ability to read from UPDATE statements entirely in this scheme,
> as this would then look very much like SQL.
>
>
>
>
>
> *From: *Blake Eggleston <[email protected]>
> *Date: *Wednesday, 8 June 2022 at 20:59
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> > It affects not just RETURNING but also conditions that are evaluated
> against the row, and if we in future permit using the values from one
> select in a function call / write to another table (which I imagine we
> will).
>
> I hadn’t thought about that... using intermediate or even post update
> values in condition evaluation or function calls seems like it would make
> it difficult to understand why a condition is or is not applying. On the
> other hand, it would powerful, especially when using things like database
> generated values in queries (auto incrementing integer clustering keys or
> server generated timeuuids being examples that come to mind). Additionally,
> if we return these values, I guess that would solve the visibility issues
> I’m worried about.
>
> Agreed intermediate values would be straightforward to calculate though.
>
>
>
> On Jun 6, 2022, at 4:33 PM, [email protected] wrote:
>
>
>
> It affects not just RETURNING but also conditions that are evaluated
> against the row, and if we in future permit using the values from one
> select in a function call / write to another table (which I imagine we
> will).
>
>
>
> I think that for it to be intuitive we need it to make sense sequentially,
> which means either calculating it or restricting what can be stated (or
> abandoning the syntax).
>
>
>
> If we initially forbade multiple UPDATE/INSERT to the same key, but
> permitted overlapping DELETE (and as many SELECT as you like) that would
> perhaps make it simple enough? Require for now that SELECTS go first, then
> DELETE and then INSERT/UPDATE (or vice versa, depending what we want to
> make simple)?
>
>
>
> FWIW, I don’t think this is terribly onerous to calculate either, since
> it’s restricted to single rows we are updating, so we could simply maintain
> a collections of rows and upsert into them as we process the execution.
> Most transactions won’t need it, I suspect, so we don’t need to worry about
> perfect efficiency.
>
>
>
>
>
> *From: *Blake Eggleston <[email protected]>
> *Date: *Tuesday, 7 June 2022 at 00:21
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> That's a good question. I'd lean towards returning the final state of
> things, although I could understand expecting to see intermediate state.
> Regarding range tombstones, we could require them to precede any updates
> like selects, but there's still the question of how to handle multiple
> updates to the same cell when the user has requested we return the
> post-update state of the cell.
>
>
>
> On Jun 6, 2022, at 4:00 PM, [email protected] wrote:
>
>
>
> > if multiple updates end up touching the same cell, I’d expect the last
> one to win
>
>
>
> Hmm, yes I suppose range tombstones are a plausible and reasonable thing
> to mix with inserts over the same key range.
>
>
>
> What’s your present thinking about the idea of handling returning the
> values as of a given point in the sequential execution then?
>
>
>
> The succinct syntax is I think highly desirable for user experience, but
> this does complicate it a bit if we want to remain intuitive.
>
>
>
>
>
>
>
>
>
> *From: *Blake Eggleston <[email protected]>
> *Date: *Monday, 6 June 2022 at 23:17
> *To: *[email protected] <[email protected]>
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Hi all,
>
> Thanks for all the input and questions so far. Glad people are excited
> about this!
>
> I didn’t have any free time to respond this weekend, although it looks
> like Benedict has responded to most of the questions so far, so if I don’t
> respond to a question you asked here, you can interpret that as “what
> Benedict said” :).
>
>
> Jeff,
>
> > Is there a new keyword for “partition (not) exists” or is it inferred by
> the select?
>
> I'd intended this to be worked out from the select statement, ie: if the
> read/reference is null/empty, then it doesn't exist, whether you're
> interested in the partition, row, or cell. So I don't think we'd need an
> additional keyword there. I think that would address partition exists / not
> exists use cases?
>
> > And would you allow a transaction that had > 1 named select and no
> modification statements, but commit if 1=1 ?
>
> Yes, an unconditional commit (ie: just COMMIT TRANSACTION; without an IF)
> would be part of the syntax. Also, running a txn that doesn’t contain
> updates wouldn’t be a problem.
>
> Patrick, I think Benedict answered your questions? Glad you got the joke :)
>
> Alex,
>
> > 1. Dependant SELECTs
> > 2. Dependant UPDATEs
> > 3. UPDATE from secondary index (or SASI)
> > 5. UPDATE with predicate on non-primary key
>
> The full primary key must be defined as part of the statement, and you
> can’t use column references to define them, so you wouldn’t be able to run
> these.
>
> > MVs
>
> To prevent being spread too thin, both in syntax design and implementation
> work, I’d like to limit read and write operations in the initial
> implementation to vanilla selects, updates, inserts, and deletes. Once we
> have a solid implementation of multi-key/table transactions supporting
> foundational operations, we can start figuring out how the more advanced
> pieces can be best supported. Not a great answer to your question, but a
> related tangent I should have included in my initial email.
>
> > ... RETURNING ...
>
> I like the idea of the returning statement, but to echo what Benedict
> said, I think any scheme for specifying data to be returned should apply
> the same to select and update statements, since updates can have underlying
> reads that the user may be interested in. I’d mentioned having an optional
> RETURN statement in addition to automatically returning selects in my
> original email.
>
> > ... WITH ...
>
> I like the idea of defining statement names at the beginning of a
> statement, since I could imagine mapping names to selects might get
> difficult if there are a lot of columns in the select or update, but
> beginning each statement with `WITH <name>` reduces readability imo. Maybe
> putting the name after the first term of the statement (ie: `SELECT * AS
> <name> WHERE...`, `UPDATE table AS <name> SET ...`, `INSERT INTO table AS
> <name> (...) VALUES (...);`) would be improve finding names without harming
> overall readability?
>
> Benedict,
>
> > I agree that SELECT statements should be required to go first.
>
> +1
>
> > There only remains the issue of conditions imposed upon
> UPDATE/INSERT/DELETE statements when there are multiple statements that
> affect the same primary key. I think we can (and should) simply reject such
> queries for now, as it doesn’t make much sense to have multiple statements
> for the same primary key in the same transaction.
>
> Unfortunately, I think there are use cases for both multiple selects and
> updates for the same primary key in a txn. Selects aren’t as problematic,
> but if multiple updates end up touching the same cell, I’d expect the last
> one to win. This would make dealing with range tombstones a little
> trickier, since the default behavior of alternating updates and range
> tombstones affecting the same cells is not intuitive, but I don’t think it
> would be too bad.
>
>
> Something that’s come up a few times, and that I’ve also been thinking
> about is whether to return the values that were originally read, or the
> values written with the update to the client, and there are use cases for
> both. I don’t remember who suggested it, but I think returning the original
> values from named select statements, and the post-update values from named
> update statements is a good way to handle both. Also, while returning the
> contents of the mutation would be the easiest, implementation wise,
> swapping cell values from the updates named read would be most
> useful, since a txn won’t always result in an update, in which case we’d
> just return the select.
>
> Thanks,
>
> Blake
>
>
>
>
>
>
>
>
> On Jun 6, 2022, at 9:41 AM, Henrik Ingo <[email protected]> wrote:
>
>
>
> On Mon, Jun 6, 2022 at 5:28 PM [email protected] <[email protected]>
> wrote:
>
> > One way to make it obvious is to require the user to explicitly type the
> SELECTs and then to require that all SELECTs appear before
> UPDATE/INSERT/DELETE.
>
>
>
> Yes, I agree that SELECT statements should be required to go first.
>
>
>
> However, I think this is sufficient and we can retain the shorter format
> for RETURNING. There only remains the issue of conditions imposed upon
> UPDATE/INSERT/DELETE statements when there are multiple statements that
> affect the same primary key. I think we can (and should) simply reject such
> queries for now, as it doesn’t make much sense to have multiple statements
> for the same primary key in the same transaction.
>
>
>
> I guess I was thinking ahead to a future where and UPDATE write set may or
> may not intersect with a previous update due to allowing WHERE clause to
> use secondary keys, etc.
>
>
>
> That said, I'm not saying we SHOULD require explicit SELECT statements for
> every update. I'm sure that would be annoying more than useful.I was just
> following a train of thought.
>
>
>
>
>
>
>
> > Returning the "result" from an UPDATE presents the question should it be
> the data at the start of the transaction or end state?
>
>
>
> I am inclined to only return the new values (as proposed by Alex) for the
> purpose of returning new auto-increment values etc. If you require the
> prior value, SELECT is available to express this.
>
>
>
> That's a great point!
>
>
>
>
>
> > I was thinking the following coordinator-side implementation would allow
> to use also old drivers
>
>
>
> I am inclined to return just the first result set to old clients. I think
> it’s fine to require a client upgrade to get multiple result sets.
>
>
>
> Possibly. I just wanted to share an idea for consideration. IMO the temp
> table idea might not be too hard to implement*, but sure the syntax does
> feel a bit bolted on.
>
>
>
> *) I'm maybe the wrong person to judge that, of course :-)
>
>
>
> henrik
>
>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> +---------------------------------------------------------------+
>
> | Derek Chen-Becker                                             |
>
> | GPG Key available at https://keybase.io/dchenbecker and       |
>
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>
> +---------------------------------------------------------------+
>
>
>
>
>
> --
>
> +---------------------------------------------------------------+
>
> | Derek Chen-Becker                                             |
>
> | GPG Key available at https://keybase.io/dchenbecker and       |
>
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>
> +---------------------------------------------------------------+
>
>
>
>
>
> --
>
> +---------------------------------------------------------------+
>
> | Derek Chen-Becker                                             |
>
> | GPG Key available at https://keybase.io/dchenbecker and       |
>
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>
> +---------------------------------------------------------------+
>
>
>
>
>
> --
>
> +---------------------------------------------------------------+
>
> | Derek Chen-Becker                                             |
>
> | GPG Key available at https://keybase.io/dchenbecker and       |
>
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>
> +---------------------------------------------------------------+
>
>
>


-- 
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: CEP-15 multi key transaction syntax

Reply via email to