Re: CEP-15 multi key transaction syntax

bened...@apache.org Sun, 05 Jun 2022 13:23:42 -0700

> 1. Dependant SELECTs
> 2. Dependant UPDATEs
> 3. UPDATE from secondary index (or SASI)
> 5. UPDATE with predicate on non-primary key


So, I think these are all likely to be rejected the same way they are today, as 
the individual statements would not parse [1,2] or be validated [3,5], as I’m 
fairly sure UPDATE and INSERT require a primary key to be specified and that 
only SELECT supports secondary indexes.

It could be nice to have dedicated messages explaining the limitation for 
[1,2], at least until the restriction is lifted.

> 4. The presence of a materialized view

This is a bit more complex. I think in principle MVs could function as they do 
today, i.e. with eventually consistent update. MVs remain experimental however, 
with known shortcomings, and I am not keen to validate them with Accord.

Since I think our plan is to opt tables into transactional behaviour (to 
minimise the potential for misusing them, unlike LWTs, which are easily used 
unsafely), I would prefer to ensure that MVs are mutually exclusive with 
transactions for now.

I anticipate follow up work will deliver global secondary indexes on top of 
Accord. I’ve no idea if that will replace or coexist with MVs as they exist 
today, perhaps it will be possible to create MVs and specify their consistency 
properties on creation once the existing MVs are reliable.

> 6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be

I’m not sure what our plans are around aggregations and transactions, perhaps 
Blake can speak more to his thoughts. Since aggregations are relatively new I 
am inclined to exclude them initially, at least for write transactions, since 
LWTs do not support them.

Otherwise we will need some deterministic measure for aborting transactions – 
even after we have agreed to execute them. E.g. a 5000 row limit on live rows 
read as input before a transaction is converted to a no-op. We will have to be 
especially careful here for unconditional transactions without any 
SELECT/RETURN, as these must still wait for the result of execution before 
notifying the user of the outcome, if it may be aborted.

Suggestions welcome here.

> 7. Triggers

Good question!

It looks like LWTs don’t integrate with triggers today, so I guess we can 
ignore them too. I don’t know how stable triggers are, or how widely they are 
used. I’m sure we have some use cases, but I’m not aware of any community 
members that use them so it is likely sparse.

In principle a trigger could modify the transaction submitted by a client to 
include additional updates, but this would likely require changes to the 
trigger API. I anticipate ignoring them until we have community demand.

> Random Syntax Thoughts

I like the RETURNING syntax, and consistency with SQL dialects is a plus. I’m 
concerned about consistency with SELECT statements, though – these already 
imply RETURNING, but we might use them to compute constraint clauses on tables 
we are not updating, and this would leave no consistent way of doing this 
without returning all of its fields to the user, at least not without multiple 
SELECT statements over the same data.

We could introduce a new keyword such as CONSTRAIN in this case, with syntax 
equivalent to UPDATE/DELETE but supporting RETURNING and by default not 
returning any fields?

The idea of a RETURNING syntax on the transaction itself was previously floated 
and is nice, but I worry about having multiple inconsistent ways of returning 
data that can be co-mingled. How would you envisage these keywords interacting?


From: Alex Miller <millerde...@gmail.com>
Date: Sunday, 5 June 2022 at 03:39
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: CEP-15 multi key transaction syntax
All of my text below largely extends your question of syntax in a few
directions:
 - What is the user experience of trying to run different statements
with this syntax?
 - How do transactions interact with other Cassandra constructs?
 - What are the execution semantics of these statements?
which I do acknowledge is a moderate re-scoping of the question.

Also, please take my understanding of existing CQL and DDL constructs with
an impractically large grain of salt.


Undesireable Transactions
-------------------------

I tried to match CQL docs up against a number of ways of writing statements
which Accord wouldn't like, or users might not like the effect of running.
I'm assuming it'd be good to think through how one would express the error
message or guidance given to users?  Or at least just making sure I
understand correctly what is writable but not executable or desirable.

=== Likely Unexecutable

All the cases here are predicated on the lack of automatic reconnaissance
transaction support.

1. Dependant SELECTs

    CREATE TABLE users (name text primary key, home_state text);
    CREATE TABLE states (name text primary key, population int);

    BEGIN TRANSACTION;
      /*1*/ SELECT home_state FROM users WHERE name='blake' AS user;
      /*2*/ SELECT population FROM states WHERE name=user.home_state AS state;
    COMMIT TRANSACTION;

The primary key for SELECT (2) depends on a value produced by SELECT (1), which
results in not being able to produce the full read conflict set ahead of time
for Accord.

2. Dependant UPDATEs

    CREATE TABLE users (name text primary key, home_state text);
    CREATE TABLE states (name text primary key, population int);

    BEGIN TRANSACTION;
      SELECT home_state FROM users WHERE name='blake' AS user;
      UPDATE states SET population += 1 WHERE name=user.home_state AS state;
    COMMIT TRANSACTION;

The primary key for UPDATE depends on a value produced by SELECT, which
results in not being able to produce the full write conflict set ahead of time
for Accord.

3. UPDATE from secondary index (or SASI)

    CREATE TABLE users (id int primary key, name text, home_state text);
    CREATE INDEX users_by_name ON users (name);

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

This is just a rephasing of (2), but hiding the SELECT behind an implict query
to the secondary index for the primary key.

(But an UPDATE from a covering index would be okay!)

4. The presence of a materialized view which can implicitly add
any/all of the above

    CREATE TABLE users (
      name text primary key,
      miles_driven int,
      state text,
    );
    CREATE MATERIALIZED VIEW users_by_home_state_by_miles AS
      SELECT * FROM users
      WHERE home_state IS NOT NULL
      PRIMARY KEY (home_state, miles_driven, name);

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

This is a rephrasing of (2), but the UPDATE is to the materialized view, and
the SELECT is to get miles_driven out of the UPDATE on users.  Some
materialized views would be fine to transactionally update though.

=== Poor Performance

5. UPDATE with predicate on non-primary key

    CREATE TABLE users (
      id int primary key,
      name text,
      miles_driven int
    );

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

As now any transaction which touches the 'blake' row in `users` is going to
have to wait behind a full table scan completing in Accord's transaction
executor. Then any transaction which conflicts with one of those would also
have to wait, and eventually snowball into a cascading stall in transaction
processing.

Support for these kinds of things could be useful to some users though?  It
might be wise to consider extending Accord with a table-level lock concept
to avoid having to maintain conflict information on every key in the table
individually.

6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be

    CREATE TABLE users (name text primary key, state text, miles_driven int);

    BEGIN TRANSACTION;
      SELECT sum(miles_driven) FROM users WHERE state='Ohio';
    COMMIT TRANSACTION;

As read-only transactions should be cheap, and I don't think the computation
would be much more notably expensive than the non-transactional version of
this.

However, what maybe feels similar to a user:

    BEGIN TRANSACTION;
      SELECT sum(miles_driven) FROM users WHERE state='Ohio';
      UPDATE miles_by_state SET total=sum(miles_driven) WHERE state='Ohio';
    COMMIT TRANSACTION;

does mean we're back in the bad idea category.

=== I Have Literally No Idea

7. Triggers

What are the transactional guarantees of triggers?  These are
implemented in Java, so that'd just be outright banned by Accord?  Unless
triggers can have an API to spit out extra conflict ranges for the partition
they live on?  Sounds like an Accord Transactions v2 problem. :p


Wide Rows and Projection
------------------------

Slightly reducing your example down, I'm curious if you see these two having
exactly the same execution:

    BEGIN TRANSACTION;
      SELECT * FROM users WHERE name='blake' AS user;
      UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
    COMMIT TRANSACTION;

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

When we assume that the users table is defined as something like:

    CREATE TABLE users (
        name text PRIMARY KEY,
        account_active boolean,
        miles_driven int,
        last_update timestamp,
        -- [ many other columns ]
        1GB_of_personal_thoughts text,
    );

Which hopefully outlines the case where a naive implementation of
`SELECT * ...  AS user` would read >1GB per row, only to discard most of it
by the write phase.

Do you plan on analyzing the CQL transaction in full to calculate the minimal
set of columns needed from each table?  Or does the execution expected to
follow what is written?  This also maybe turns into a question of if someone
wishes to build a CQL validator, how much parsing and processing of CQL
statements should they also need to do?


Random Syntax Thoughts
----------------------

The consistent theme being "what if CQL read a little more SQL-y?"

=== RETURNING

`UPDATE ... RETURNING (columns)` is a non-standard but commonly implemented SQL
extension. Adding support to CQL for it feels like a nice way to remove the
need for analysis to compute which columns need to be read from updated tables.

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake' RETURNING
(account_active) AS users;
    COMMIT TRANSACTION IF users.account_active;

And for each DML type...
 - INSERT ... RETURNING returns inserted data (useful for defaulted or
autoincrement columns).
 - UPDATE ... RETURNING returns the modified data.
 - DELETE ... RETURNING returns the now-deleted data.

Instead of a RETURN statement, one could use a similar clause on the COMMIT:

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake' RETURNING
(name, account_active) AS user;
    COMMIT TRANSACTION IF users.account_active RETURNING (user.name);

=== WITH

SQL's `WITH` construct almost lets one rewrite these statements in a
different form

    BEGIN TRANSACTION;
    WITH
      user AS (SELECT * FROM users WHERE name='blake'),
      car AS (SELECT * FROM cars WHERE model='pinto'),
      updated_users AS (UPDATE users SET miles_driven =
user.miles_driven + 30 WHERE name='blake'),
      updated_cars AS (UPDATE cars SET miles_driven = car.miles_driven
+ 30 WHERE model='pinto'),
    COMMIT TRANSACTION IF car.is_running;

Which reads oddly to me, partly because subqueries are only a SQL thing, but I
maybe do like the name being leading instead of trailing for ease of finding
the name?

    BEGIN TRANSACTION;
      WITH user AS SELECT * FROM users WHERE name='blake';
      WITH car AS SELECT * FROM cars WHERE model='pinto';
      UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
      UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
    COMMIT TRANSACTION IF car.is_running;

But I also see that WITH was already used in CQL to specify options, so that'd
maybe be more confusing in context.


Cheers,
Alex

Re: CEP-15 multi key transaction syntax

Reply via email to