Re: CEP-15 multi key transaction syntax

Alex Miller Sat, 04 Jun 2022 19:39:05 -0700

All of my text below largely extends your question of syntax in a few
directions:
 - What is the user experience of trying to run different statements
with this syntax?
 - How do transactions interact with other Cassandra constructs?
 - What are the execution semantics of these statements?
which I do acknowledge is a moderate re-scoping of the question.


Also, please take my understanding of existing CQL and DDL constructs with
an impractically large grain of salt.


Undesireable Transactions
-------------------------

I tried to match CQL docs up against a number of ways of writing statements
which Accord wouldn't like, or users might not like the effect of running.
I'm assuming it'd be good to think through how one would express the error
message or guidance given to users?  Or at least just making sure I
understand correctly what is writable but not executable or desirable.

=== Likely Unexecutable

All the cases here are predicated on the lack of automatic reconnaissance
transaction support.

1. Dependant SELECTs

    CREATE TABLE users (name text primary key, home_state text);
    CREATE TABLE states (name text primary key, population int);

    BEGIN TRANSACTION;
      /*1*/ SELECT home_state FROM users WHERE name='blake' AS user;
      /*2*/ SELECT population FROM states WHERE name=user.home_state AS state;
    COMMIT TRANSACTION;

The primary key for SELECT (2) depends on a value produced by SELECT (1), which
results in not being able to produce the full read conflict set ahead of time
for Accord.

2. Dependant UPDATEs

    CREATE TABLE users (name text primary key, home_state text);
    CREATE TABLE states (name text primary key, population int);

    BEGIN TRANSACTION;
      SELECT home_state FROM users WHERE name='blake' AS user;
      UPDATE states SET population += 1 WHERE name=user.home_state AS state;
    COMMIT TRANSACTION;

The primary key for UPDATE depends on a value produced by SELECT, which
results in not being able to produce the full write conflict set ahead of time
for Accord.

3. UPDATE from secondary index (or SASI)

    CREATE TABLE users (id int primary key, name text, home_state text);
    CREATE INDEX users_by_name ON users (name);

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

This is just a rephasing of (2), but hiding the SELECT behind an implict query
to the secondary index for the primary key.

(But an UPDATE from a covering index would be okay!)

4. The presence of a materialized view which can implicitly add
any/all of the above

    CREATE TABLE users (
      name text primary key,
      miles_driven int,
      state text,
    );
    CREATE MATERIALIZED VIEW users_by_home_state_by_miles AS
      SELECT * FROM users
      WHERE home_state IS NOT NULL
      PRIMARY KEY (home_state, miles_driven, name);

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

This is a rephrasing of (2), but the UPDATE is to the materialized view, and
the SELECT is to get miles_driven out of the UPDATE on users.  Some
materialized views would be fine to transactionally update though.

=== Poor Performance

5. UPDATE with predicate on non-primary key

    CREATE TABLE users (
      id int primary key,
      name text,
      miles_driven int
    );

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

As now any transaction which touches the 'blake' row in `users` is going to
have to wait behind a full table scan completing in Accord's transaction
executor. Then any transaction which conflicts with one of those would also
have to wait, and eventually snowball into a cascading stall in transaction
processing.

Support for these kinds of things could be useful to some users though?  It
might be wise to consider extending Accord with a table-level lock concept
to avoid having to maintain conflict information on every key in the table
individually.

6. Large SELECTs Are Actually Okay But Look Like They Shouldn't Be

    CREATE TABLE users (name text primary key, state text, miles_driven int);

    BEGIN TRANSACTION;
      SELECT sum(miles_driven) FROM users WHERE state='Ohio';
    COMMIT TRANSACTION;

As read-only transactions should be cheap, and I don't think the computation
would be much more notably expensive than the non-transactional version of
this.

However, what maybe feels similar to a user:

    BEGIN TRANSACTION;
      SELECT sum(miles_driven) FROM users WHERE state='Ohio';
      UPDATE miles_by_state SET total=sum(miles_driven) WHERE state='Ohio';
    COMMIT TRANSACTION;

does mean we're back in the bad idea category.

=== I Have Literally No Idea

7. Triggers

What are the transactional guarantees of triggers?  These are
implemented in Java, so that'd just be outright banned by Accord?  Unless
triggers can have an API to spit out extra conflict ranges for the partition
they live on?  Sounds like an Accord Transactions v2 problem. :p


Wide Rows and Projection
------------------------

Slightly reducing your example down, I'm curious if you see these two having
exactly the same execution:

    BEGIN TRANSACTION;
      SELECT * FROM users WHERE name='blake' AS user;
      UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
    COMMIT TRANSACTION;

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake';
    COMMIT TRANSACTION;

When we assume that the users table is defined as something like:

    CREATE TABLE users (
        name text PRIMARY KEY,
        account_active boolean,
        miles_driven int,
        last_update timestamp,
        -- [ many other columns ]
        1GB_of_personal_thoughts text,
    );

Which hopefully outlines the case where a naive implementation of
`SELECT * ...  AS user` would read >1GB per row, only to discard most of it
by the write phase.

Do you plan on analyzing the CQL transaction in full to calculate the minimal
set of columns needed from each table?  Or does the execution expected to
follow what is written?  This also maybe turns into a question of if someone
wishes to build a CQL validator, how much parsing and processing of CQL
statements should they also need to do?


Random Syntax Thoughts
----------------------

The consistent theme being "what if CQL read a little more SQL-y?"

=== RETURNING

`UPDATE ... RETURNING (columns)` is a non-standard but commonly implemented SQL
extension. Adding support to CQL for it feels like a nice way to remove the
need for analysis to compute which columns need to be read from updated tables.

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake' RETURNING
(account_active) AS users;
    COMMIT TRANSACTION IF users.account_active;

And for each DML type...
 - INSERT ... RETURNING returns inserted data (useful for defaulted or
autoincrement columns).
 - UPDATE ... RETURNING returns the modified data.
 - DELETE ... RETURNING returns the now-deleted data.

Instead of a RETURN statement, one could use a similar clause on the COMMIT:

    BEGIN TRANSACTION;
      UPDATE users SET miles_driven += 30 WHERE name='blake' RETURNING
(name, account_active) AS user;
    COMMIT TRANSACTION IF users.account_active RETURNING (user.name);

=== WITH

SQL's `WITH` construct almost lets one rewrite these statements in a
different form

    BEGIN TRANSACTION;
    WITH
      user AS (SELECT * FROM users WHERE name='blake'),
      car AS (SELECT * FROM cars WHERE model='pinto'),
      updated_users AS (UPDATE users SET miles_driven =
user.miles_driven + 30 WHERE name='blake'),
      updated_cars AS (UPDATE cars SET miles_driven = car.miles_driven
+ 30 WHERE model='pinto'),
    COMMIT TRANSACTION IF car.is_running;

Which reads oddly to me, partly because subqueries are only a SQL thing, but I
maybe do like the name being leading instead of trailing for ease of finding
the name?

    BEGIN TRANSACTION;
      WITH user AS SELECT * FROM users WHERE name='blake';
      WITH car AS SELECT * FROM cars WHERE model='pinto';
      UPDATE users SET miles_driven = user.miles_driven + 30 WHERE name='blake';
      UPDATE cars SET miles_driven = car.miles_driven + 30 WHERE model='pinto';
    COMMIT TRANSACTION IF car.is_running;

But I also see that WITH was already used in CQL to specify options, so that'd
maybe be more confusing in context.


Cheers,
Alex

Re: CEP-15 multi key transaction syntax

Reply via email to