[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153239#comment-16153239
 ] 

Paulo Motta commented on CASSANDRA-11500:
-----------------------------------------

bq. As we discussed offline, we need to make sure the raw data including 
tombstone, expired liveness are shipped to the coordinator side. Enforcing 
strict liveness in ReadCommand.executeLocally() would remove the row before 
digest or data response. Instead, we add enforceStrictLiveness to Row.purge to 
get the same result but less interfaces changes for Row.

Excellent catch, my bad for overlooking this, but gladly you added a new dtest 
for it: 
[test_base_column_in_view_pk_complex_timestamp_with_flush|https://github.com/apache/cassandra-dtest/commit/6d77ace5361f020ba182072ade9f4ab98025c213#diff-62ba429edee6a4681782f078246c9893R993].
 Enforcing strict liveness during reconciliation on the coordinator is the 
correct approach to differentiate between non-existing and removed rows via 
strict liveness. Ideally strict liveness would be a row property but as we saw 
in the previous version, this requires changes a bunch of other classes and 
interfaces, so let's keep it as a flag to Row.purge for now.

bq. Shadowable tombstone will be deprecated and use expired livenessInfo if the 
deletion time is greater than merged-row deletion to avoid uncessary expired 
livenessInfo.

LGTM, this will prevent using expired liveness info in most cases. We should 
probably remove this as part of CASSANDRA-13826 since we're overlading 
ExpiringLivenessInfo for a different purpose but it should be fine for now.

Final patch and CI results look good and I'm confident we've exercised all edge 
cases properly with unit and dtests. Committed dtests to master as 
{{6d77ace5361f020ba182072ade9f4ab98025c213}}. Commited to cassandra-3.0 branch 
as {{1b36740ebe66b8ed4c3d6cb64eb2419a9279dfbf}} and merged up to cassandra-3.11 
and trunk. Great job Zhao!

*Summary of improvements and fixed issues*

This patch address most outstanding timestamp problems with MVs on the 3.x 
series without changing the binary or storage protocol. The remaining issues 
which requires storage engine and binary protocol changes will be addressed on 
CASSANDRA-13826 on trunk. Below is a summary of the main changes and fixes this 
patch introduces.

{noformat}
* View same PK components as base

* DELETE of selected column should not affect out of order updates 
(testPartialDeleteSelectedColumn)
* DELETE of unselected column/collection should not affect ordered updates 
(testUpdateColumnNotInView, testPartialUpdateWithUnselectedCollections - 
CASSANDRA-13127)
* Unselected columns should keep view row alive when other columns expire 
(testUnselectedColumnsTTL - CASSANDRA-13127)

* Extra column on view PK

* View row should expire when view PK column expires in base 
(testUpdateColumnInViewPKWithTTL - CASSANDRA-13657)
* Commutative row deletion (testCommutativeRowDeletion, 
testCellTombstoneAndShadowableTombstones - CASSANDRA-13409)
* Out of order updates to extra column on view PK 
(testUpdateWithColumnTimestampBiggerThanPk, 
testUpdateWithColumnTimestampSmallerThanPk, CASSANDRA-11500)

* Unsupported scenarios

* DELETE of unselected column should not affect out of order updates 
(testPartialDeleteUnselectedColumn - CASSANDRA-13127 - Added )
* Filtering by non-PK base column (Only on C* 3.11+, disallowed on 
CASSANDRA-13798)

* Additional MV tests

* testNonBaseColumnInViewPk
* testRangeDeletion
* testStrictLivenessTombstone
* testFrozenCollectionsWithComplexInnerType
* testMVWithDifferentColumns
* testBaseTTLWithSameTimestampTest
{noformat}

*Summary of code changes*

{noformat}
1. Using expired livenessInfo if computed deletion time is greater than merged 
row deletion. There are only 2 cases:
      a. non-pk base column used in view pk is removed by partial update or 
partial delete
      b. unselected base column is removed by partial update or partial delete
   
   Current shadowable tombstone is not used to avoid the issue of resurrecting 
deleted cells (CASSANDRA-13049). We will expired-livenessInfo and merged base 
row deletion instead.

2. It's strict-liveness iff there is non-key base column in view-pk. The 
existence of view row is solely base on this non-key base column.

3. If there is no non-base PK column in view-pk, the view's liveness/deletion 
is using max of base livenessIno + unselected column. unselected column's ttl 
is used only when it affects view row liveness. Selected columns won't 
contribute to livenessInfo or row deletion.
    * this wouldn't support complex cases as explained above. eg. c/d 
unselected, update c@10, delete c@11, update d@5. view row should be alive but 
dead - We added a note to NEWS.txt as well as documentation about this 
unsupported case.
    * since we change the way the deletion timestamp is computed in this case, 
an older deletion to the non-base PK may not properly shadow a previous view 
row with a greater timestamp, so repair should be run before upgrade on the 
base table to ensure deletions are properly generated with the previous 
approach, or alternatively repair can be run on the views after upgrade (added 
note to NEWS.txt).

4. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

5. When generating read command to read existing base data, need to query all 
base columns instead of view's queried column if base and view having same key 
columns to read unselected column.

6. Fix handling of timestamp tie on PrimaryKeyLivenessInfo to match the one for 
cell resolution: on timestamp tie, greater timestamp supersede lower timestamp 
and TTL supersedes non-TTL (tested on testBaseTTLWithSameTimestampTest).

7. Disallow drop of columns on base tables with MVs because we cannot tell if 
the dropped column is keeping a view row alive (will be fixed on 
CASSANDRA-13826)
{noformat}

> Obsolete MV entry may not be properly deleted
> ---------------------------------------------
>
>                 Key: CASSANDRA-11500
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>            Reporter: Sylvain Lebresne
>            Assignee: ZhaoYang
>             Fix For: 3.0.15, 3.11.1, 4.0
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to