[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119461#comment-16119461
 ] 

Paulo Motta commented on CASSANDRA-11500:
-----------------------------------------

The virtual cell proposal is pretty clever and looks like it would solve most 
outstanding issues, but I'm a bit concerned about adding new structures to the 
storage engine to deal with materialized-specific issues.

While I agree we should do if it's our only choice, we should explore 
alternatives which reuse existing structures if possible to avoid introducing 
feature-specific stuff into the storage engine.

Looking back at the original scenario which motivated this ticket 
(CASSANDRA-11500):
{noformat}
CREATE TABLE t (k int PRIMARY KEY, a int, b int);
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT 
NULL PRIMARY KEY (k, a);

INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;

SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
(invalid) and the new one
{noformat}

It seems to me that the problem here is applying the definition for standard 
tables that if a single column is live, then the whole row is live, which does 
not need to be the case for MV where we can guarantee a view entry will always 
contain row-level liveness info.

We could solve this  by introducing a "strict " flag to the row liveness info, 
which has the following semantic:
- A strict row is only live iff it's row level liveness info is live, 
regardless of the liveness of its columns

Materialized views rows would have this flag set and perform deletions with its 
max primary key timestamp (instead of max timestamp of all keys), and this 
would solve the issue above by ensuring the row {{(1, 1)@(liveness@0, 
deleted@2)=(b=2@4)}} would not be live.

In addition to solving the original problem we would not create the second 
problem in the ticket description of updates to the view primary key with a 
smaller timestamp to be shadowed by a shadowable tombstone using the max 
timestamp of a non-PK column. In this approach the shadowing tombstone 
mechanism would still be orthogonal to the strict liveness and working as it is 
today.

This mechanism alone would not solve all other problems but at least the ones 
described in this ticket description with minimal change in the storage engine, 
let's now go through the other issues to see how we could solve them:

*View row expires too fast with unselected base column (CASSANDRA-13127)*

>From the discussion on CASSANDRA-13127 it seems like you found and fixed some 
>issues with liveness comparison in addition to no view update being generated 
>when there is an update to an unselected column which seems to solve this 
>issue in addition with the strict row concept above. Even though this will 
>require a read  when updating columns not in the view, the MV user is already 
>expecting to pay an extra price for MVs anyway so it shouldn't be a problem - 
>if you want performance you can build views manually or use CASSANDRA-9779 
>hopefully when it's ready. :-)

*based column used in view key is TTLed (CASSANDRA-13657)*

This seems to be fixed by the fix above.

*Partial update unselected columns in base (CASSANDRA-13127)*

This seem to be more of an anomaly of the partial row update semantics which 
has bad consequences for MVs than a problem with MV itself. 6.0 where thrift is 
gone is a good occasion to revisit this semantics rather than trying to make MV 
fit into it.

Right now, inserting a non-PK column (in the example {{UPDATE base USING 
TIMESTAMP 10 SET b=1 WHERE k=1 AND c=1}}) will create a row (k=1, c=1, b=1)@10 
on the end-user perspective ({{SELECT * FROM base WHERE k=1 and C=1}}) while 
internally creating only a column, which destroys the entire row in case the 
same non-PK column is removed ({{DELETE b FROM base USING TIMESTAMP 11 WHERE 
k=1 AND c=1;}}).

While this semantics may make sense in a column-oriented world, it's a tad 
bizarre in a row oriented world, given we can delete a row by simply unsetting 
a non-PK column. I think the correct and expected semantic would be: {{a column 
update to a non-existing row will create it}}.

This semantic is IMO what makes the most sense in a row oriented store and it's 
possible to implement it without falling into the unexpected/inconsistent 
behaviors of CASSANDRA-6782/CASSANDRA-6668 by adding some kind of 
{{CREATE_IF_NOT_EXISTS}} flag which basically keeps the oldest liveness entry 
if more than one is found with this flag when merging.

This semantic change would be the most correct while still preventing this 
anomaly with MVs due to the current semantic. If you agree we can create 
another ticket to propose this change.

*merging should be commutative (CASSANDRA-13409)*
The second case, represented by {{testCellTombstoneAndShadowableTombstones}}, 
is fixed by [using regular tombstones instead of shadowable tombstones for base 
table column 
deletions|https://github.com/pauloricardomg/cassandra/commit/1aeb0acbbaad6cc9520af6d1684043a5078eefa7]
 (something which was probably overlooked on CASSANDRA-10261), as suggested on 
CASSANDRA-13409.

The first and third case, represented by {{testCommutativeRowDeletion}} can 
probably be fixed on the view update generator by emitting column tombstones 
for columns not being updated when it's detected a partial update on a 
previously deleted row (unless I'm missing something).

*with filter conditions (CASSANDRA-13547)*

This seems like a genuine bug on the view update generator, already discussed 
and proposed patch on CASSANDRA-13547, of not including the view conditions on 
the base table select. The second issue discussed there of a column with lower 
timestamp being wrongly shadowed should be fixed by the strict liveness fix 
proposed before.

*unselected columns dropped*

We could still use the virtual cells idea (but probably as an additional flag 
to the current cell structure) to support this properly, but I'm not sure it 
makes sense to optimize and add additional storage overhead for this use case 
so we could maybe just disallow dropping columns on tables with MVs configured 
altogether (at least initially).

I hacked a [strict liveness 
prototype|https://github.com/pauloricardomg/cassandra/commits/11500-poc] on top 
of the original patch for CASSANDRA-13127 which seems to fix most tests except 
the ones not covered above.

While this still requires changes to the storage engine it's mostly inclusion 
of new flags, while other changes would be restricted to the view update 
generator. I could be easily missing something so please let me know if there 
are cases or inefficiencies not covered by the suggestions above which I did 
not consider.

I think the scope here is quite big, and while I understood your idea was to 
solve all these issues with a single approach, if there aren't flaws with the 
suggestions above (which require less changes in the storage engine) and we 
decide to go with them, we should probably break up the solution to this as 
following:
- On this ticket implement the strict liveness idea and test it thoroughly 
(compaction, tombstone purging etc) including the original patch for 
CASSANDRA-13127 (except the shadowable liveness commit which is probably not 
required)
- Reopen CASSANDRA-13409 to update view generator to generate column tombstones 
when receiving a partial update for a previously deleted row
- Reopen CASSANDRA-13547 to update view generator to include unselected views 
on base table select (latest patch there should already be good enough I think?)
- Open a new ticket to deal with dropped unselected columns (either disallow or 
implement a simplified virtual cell idea to deal with this)
- Open a new ticket to propose new insert-if-not-exists semantic for column 
update on non-existing row
- Open a new ticket to update sstable dump to include new flags and shadowable 
tombstone
- In my hacked prototype the frozen collection test is failing, but I didn't 
investigate into it, is this something that would be solved by any of the above 
or is it a different issue? Also please let me know if I forgot to comment on 
any other case which is not covered above.

> Obsolete MV entry may not be properly deleted
> ---------------------------------------------
>
>                 Key: CASSANDRA-11500
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to