[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082241#comment-16082241
 ] 

ZhaoYang edited comment on CASSANDRA-11500 at 7/18/17 3:10 PM:
---------------------------------------------------------------

h3. Relation: base -> view

First of all, I think all of us should agree on what cases view row should 
exists.

IMO, there are two main cases:

1. base pk and view pk are the same (order doesn't matter) and view has no 
filter conditions or only conditions on base pk.
(filter condition is not a concern here, since no previous view data to be 
cleared)

view row exists if any of following is true:
* a. base row pk has live livenessInfo(timestamp) and base row pk satifies 
view's filter conditions if any.
* b. or one of base row columns selected in view has live timestamp (via 
update) and base row pk satifies view's filter conditions if any. this is 
handled by existing mechanism of liveness and tombstone since all info are 
included in view row
* c. or one of base row columns not selected in view has live timestamp (via 
update) and base row pk satifies view's filter conditions if any. Those 
unselected columns' timestamp/ttl/cell-deletion info currently are not stored 
on view row. 

2. base column used in view pk or view has filter conditions on base non-key 
column which can also lead to entire view row being wiped.

view row exists if any of following is true:
* a. base row pk has live livenessInfo(timestamp) && base column used in view 
pk is not null but no timestamp && conditions are satisfied. ( pk having live 
livenesInfo means it is not deleted by tombstone)
* b. or base row column in view pk has timestamp (via update) && conditions are 
satisfied. eg. if base column used in view pk is TTLed, entire view row should 
be wiped.

Next thing is to model "shadowable tombstone or shadowable liveness" to 
maintain view data based on above cases.
 
h3. Previous known issues: 
(I might miss some issues, feel free to ping me..)

ttl
* view row is not wiped when TTLed on base column used in view pk or TTLed on 
base non-key column with filter condition
* cells with same timestamp, merging ttls are not deterministic.

partial update on base columns not selected in view
* it results in no view data. because of current update semantics, no view 
updates are generated
* corresponding view row liveness is not depending on liveness of base columns

filter conditions or base column used in view pk causes
* view row is shadowed after a few modification on base column used in view pk 
if the base non-key column has TS greater than base pk's ts and view key 
column's ts. (as mentioned by sylvain: we need to be able to re-insert an entry 
when a prior one had been deleted need to be careful to hanlde timestamp tie)

tombstone merging is not commutative
* in current code, shadowable tombstone doesn't co-exist with regular tombstone

sstabledump doesn't not support current shadowable tombstone

h3. Model (TO BE UPDATED)

{{ShadowableTombstone}} : 
* deletion-time, isShadowable, and "viewKeyTs" aka. base column's ts which is 
part of view pk(used to reconcile when timestamp tie), if there is no timestamp 
associated with that column, use base pk timestamp instead.
* it's only generated when one base column is a pk in view and this base column 
value is changed in base row, to mark previous view row as deleted. (original 
definition of {{shadowable}} in CASSANDRA-10261).  in other cases, {{standard 
tombstone}} is generated for view rows.
* if {{ShadowableTombstone}} is superseded by {{LivenessInfo}}, columns 
shadowed by {{ShadowableTombstone}} will come back alive. (original definition 
of {{shadowable}} in CASSANDRA-10261)
* {{ShadowableTombstone}}  should co-exist with {{Standard Tombstone}} if 
{{shadowable}}'s deletion time supersedes {{standard tombstone}} to avoid 
bringing columns older than {{standard tombstone}} coming back alive( as in 
CASSANDRA-13409)

{{ShadowableLivenessInfo}}:  
* timestamp, and "viewKeyTs"
* if shadowable and not live, all columns in this row are considered deleted as 
in CASSANDRA-13657 and CASSANDRA-13127 to solve partial update issues

When reconcile {{ShadowableTombstone}} and {{ShadowableLivenessInfo}}: 
{quote}
if deletion-time greater than timestamp, tombstone wins
if deletion-time smaller than timestamp, livenessInfo wins
when deletion-time ties with timestamp, 
 - if {{ShadowableTombstone}}'s {{viewKeyTs}} >= {{ShadowableLivenessInfo}}'s, 
then tombstone wins
 - else livesnessInfo wins.
{quote}

When inserting to view, always use the greatest timestamp of all base columns 
in view similar to how view deletion timestamp is computed.

h3. *Example*

{quote}
CREATE TABLE t (k int PRIMARY KEY, a int, b int);
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT 
NULL PRIMARY KEY (k, a);

{{q1}} INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
{{q2}} UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
{{q3}} UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; 
{{q3}} UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; 
{quote}


* After {{q1}}:
** in base: {{k=1@0, a=1, b=1}}    // 'k' is having value '1' with timestamp '0'
** in view: 
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}  // 'k:a' is having value '1:1' with 
timestamp '0' and viewKeyTs '0' from base's pk because column 'a' has no TS
* After {{q2}}
** in base(merged): {{k=1@0, a=1, b=2@10}} 
** in view:  
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&&a=1)@TS(10,0), b=2@10}}
***  or merged: {{(k=1&&a=1)@TS(10,0), b=2@10}}
* After {{q3}}
** in base(merged): {{k=1@0, a=2@2, b=2@10}}  
** in view:  
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&&a=1)@TS(10,0), b=2@10}}
***  sstable3: {{(k=1&&a=1)@Shadowable(10,0)}} & {{(k=1&&a=2)@TS(10,2), 
b=2@10}}  // '(k=1&&a=2)' is having biggest timestamp '10' and viewKeyTs '2' 
from column 'a'
***  or merged: {{(k=1&&a=2)@TS(10,2), b=2@10}}
* After {{q4}}
** in base(merged): {{k=1@0, a=1@3, b=2@10}}  
** in view:  
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&&a=1)@TS(10,0), b=2@10}}
***  sstable3: {{(k=1&&a=1)@Shadowable(10,0)}} & {{(k=1&&a=2)@TS(10,2), 
b=2@10}} 
***  sstable4: {{(k=1&&a=2)@Shadowable(10,2)}} & {{(k=1&&a=1)@TS(10,3), 
b=2@10}}  // '(k=1&&a=1)' is having biggest timestamp '10' and viewKeyTs '3' 
from column 'a'
***  or merged: {{(k=1&&a=1)@TS(10,3), b=2@10}}

h3. *Changes*

* Extra flag in storage serialization format to facilitate {{viewKeyTs}} and 
{{co-existed standard tombstones under shadowable}}
* Message serialization to store {{viewKeyTs}}
* Row.Merger Process
* BTreeRow Filter Process


was (Author: jasonstack):
h3. *Idea*

{{ShadowableTombstone}} : 
* deletion-time, isShadowable, and "viewKeyTs" aka. base column's ts which is 
part of view pk(used to reconcile when timestamp tie), if there is no timestamp 
associated with that column, use base pk timestamp instead.
* it's only generated when one base column is a pk in view and this base column 
value is changed in base row, to mark previous view row as deleted. (original 
definition of {{shadowable}} in CASSANDRA-10261).  in other cases, {{standard 
tombstone}} is generated for view rows.
* if {{ShadowableTombstone}} is superseded by {{LivenessInfo}}, columns 
shadowed by {{ShadowableTombstone}} will come back alive. (original definition 
of {{shadowable}} in CASSANDRA-10261)
* {{ShadowableTombstone}}  should co-exist with {{Standard Tombstone}} if 
{{shadowable}}'s deletion time supersedes {{standard tombstone}} to avoid 
bringing columns older than {{standard tombstone}} coming back alive( as in 
CASSANDRA-13409)

{{ShadowableLivenessInfo}}:  
* timestamp, and "viewKeyTs"
* if shadowable and not live, all columns in this row are considered deleted as 
in CASSANDRA-13657 and CASSANDRA-13127 to solve partial update issues

When reconcile {{ShadowableTombstone}} and {{ShadowableLivenessInfo}}: 
{quote}
if deletion-time greater than timestamp, tombstone wins
if deletion-time smaller than timestamp, livenessInfo wins
when deletion-time ties with timestamp, 
 - if {{ShadowableTombstone}}'s {{viewKeyTs}} >= {{ShadowableLivenessInfo}}'s, 
then tombstone wins
 - else livesnessInfo wins.
{quote}

When inserting to view, always use the greatest timestamp of all base columns 
in view similar to how view deletion timestamp is computed.

h3. *Example*

{quote}
CREATE TABLE t (k int PRIMARY KEY, a int, b int);
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT 
NULL PRIMARY KEY (k, a);

{{q1}} INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
{{q2}} UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
{{q3}} UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; 
{{q3}} UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; 
{quote}


* After {{q1}}:
** in base: {{k=1@0, a=1, b=1}}    // 'k' is having value '1' with timestamp '0'
** in view: 
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}  // 'k:a' is having value '1:1' with 
timestamp '0' and viewKeyTs '0' from base's pk because column 'a' has no TS
* After {{q2}}
** in base(merged): {{k=1@0, a=1, b=2@10}} 
** in view:  
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&&a=1)@TS(10,0), b=2@10}}
***  or merged: {{(k=1&&a=1)@TS(10,0), b=2@10}}
* After {{q3}}
** in base(merged): {{k=1@0, a=2@2, b=2@10}}  
** in view:  
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&&a=1)@TS(10,0), b=2@10}}
***  sstable3: {{(k=1&&a=1)@Shadowable(10,0)}} & {{(k=1&&a=2)@TS(10,2), 
b=2@10}}  // '(k=1&&a=2)' is having biggest timestamp '10' and viewKeyTs '2' 
from column 'a'
***  or merged: {{(k=1&&a=2)@TS(10,2), b=2@10}}
* After {{q4}}
** in base(merged): {{k=1@0, a=1@3, b=2@10}}  
** in view:  
***  sstable1: {{(k=1&&a=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&&a=1)@TS(10,0), b=2@10}}
***  sstable3: {{(k=1&&a=1)@Shadowable(10,0)}} & {{(k=1&&a=2)@TS(10,2), 
b=2@10}} 
***  sstable4: {{(k=1&&a=2)@Shadowable(10,2)}} & {{(k=1&&a=1)@TS(10,3), 
b=2@10}}  // '(k=1&&a=1)' is having biggest timestamp '10' and viewKeyTs '3' 
from column 'a'
***  or merged: {{(k=1&&a=1)@TS(10,3), b=2@10}}

h3. *Changes*

* Extra flag in storage serialization format to facilitate {{viewKeyTs}} and 
{{co-existed standard tombstones under shadowable}}
* Message serialization to store {{viewKeyTs}}
* Row.Merger Process
* BTreeRow Filter Process

> Obsolete MV entry may not be properly deleted
> ---------------------------------------------
>
>                 Key: CASSANDRA-11500
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to