[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119823#comment-16119823
 ] 

Paulo Motta commented on CASSANDRA-11500:
-----------------------------------------

Talking offline with Zhao, it seems like there is still an outstanding case 
derived from CASSANDRA-13547 not addressed by the strict liveness suggestion:

{code:none}
// liveness or deletion using max-timestamp of view-primary-key column in base
base:  (k), a, b, c
view:  (k, a), b, c=1

q1: insert (1,1,1,1) with timestamp 0

    base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=1@0
    view: liveness=ts@0,  (k=1, a=1), b=1@0, c=1@0

q2: update c=1 with timestamp 10 where k = 1  

    base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=1@10
    view: liveness=ts@0,  (k=1, a=1), b=1@0, c=1@10

q3: update c=2 with timestamp 11 where k = 1  

    base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=2@11
    view:
          liveness=ts@0,  (k=1, a=1), b=1@0, c=1@10
          tombstone=ts@0,  (k=1, a=1)

          with strict-liveness flag, view row is dead

q4: update c=1 with timestamp 12 where k = 1  

    base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=1@12
    view:
          liveness=ts@0,  (k=1, a=1), b=1@0, c=1@10
          tombstone=ts@0,  (k=1, a=1)
          liveness=ts@0,  (k=1, a=1), b=1@0, c=1@12
         
          view row should be live..but it's dead
{code}

It seems like this scenario where the row liveness depend on a non-primary key 
was overlooked by CASSANDRA-10368 and seems to be analogous to the problem 
Tyler discovered on CASSANDRA-10226 (but with conditions rather than non-base 
view primary keys):

bq. It seems like when we include multiple non-PK columns in the view PK, we 
fundamentally have to accept that the view row's existence depends on multiple 
timestamps. I propose that we solve this by using a set of timestamps for the 
row's LivenessInfo.

The solution proposed on that ticket of keeping multiple deletion and liveness 
infos per primary key is similar to the virtual cells solution you 
independently came up (great job!). While I agree that a solution along those 
lines is the way to go moving forward, that's a pretty significant change in 
the storage engine which may introduce unforeseen problems, and would probably 
be nice to have [~slebresne] blessing given he seems to [feel 
strongly|https://issues.apache.org/jira/browse/CASSANDRA-10226?focusedCommentId=14740391&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740391]
 about it and will likely want to chime in.

I personally think that before introducing disruptive changes to the storage 
engine and MV machinery to enable relatively new features (in this case, 
filtering on non-PK columns which didn't seem to have all of its repercussions 
considered on CASSANDRA-10368), we should take a conservative approach and 
spend our energy on stabilizing current MV features.

In practical terms, I'd suggest going with the simpler strict liveness approach 
I suggested above to fix the current problems (or any alternative which do not 
require disruptive changes on the storage engine) and disallow filtering on 
non-PK while the virtual cells are not implemented - MVs with it already 
enabled would not be affected but users would be susceptible to the problem 
above (we could maybe print a warning to inform this).

After we have current MV features stabilized we can then think of implementing 
the virtual cell idea to properly enable other features like filtering on 
non-PK columns and support multiple non-PK cols in MV clustering key when 
partition key is shared (CASSANDRA-10226).

Please let me know what do you think.

> Obsolete MV entry may not be properly deleted
> ---------------------------------------------
>
>                 Key: CASSANDRA-11500
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to