[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-18 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170065#comment-16170065
 ] 

Paulo Motta commented on CASSANDRA-11500:
-

Committed {{afc55e8fe103597ef2a663be21828861a4832be7}} and 
{{6220394e84c79e6ef94651fc5e0aa03c12ddd307}} to cassandra-dtest with a minor 
fix to {{test_base_column_in_view_pk_complex_timestamp}} flakiness and minor 
change to {{test_base_column_in_view_pk_complex_timestamp}} to make it work on 
3.0.

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
> Fix For: 3.0.15, 3.11.1, 4.0
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-05 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154716#comment-16154716
 ] 

Kurt Greaves commented on CASSANDRA-11500:
--

Looks good. Thanks for all the hard work [~jasonstack] and [~pauloricardomg]

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
> Fix For: 3.0.15, 3.11.1, 4.0
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-05 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153337#comment-16153337
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

[~pauloricardomg] (y) thanks for the feedback, review, improvements.

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
> Fix For: 3.0.15, 3.11.1, 4.0
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-05 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153239#comment-16153239
 ] 

Paulo Motta commented on CASSANDRA-11500:
-

bq. As we discussed offline, we need to make sure the raw data including 
tombstone, expired liveness are shipped to the coordinator side. Enforcing 
strict liveness in ReadCommand.executeLocally() would remove the row before 
digest or data response. Instead, we add enforceStrictLiveness to Row.purge to 
get the same result but less interfaces changes for Row.

Excellent catch, my bad for overlooking this, but gladly you added a new dtest 
for it: 
[test_base_column_in_view_pk_complex_timestamp_with_flush|https://github.com/apache/cassandra-dtest/commit/6d77ace5361f020ba182072ade9f4ab98025c213#diff-62ba429edee6a4681782f078246c9893R993].
 Enforcing strict liveness during reconciliation on the coordinator is the 
correct approach to differentiate between non-existing and removed rows via 
strict liveness. Ideally strict liveness would be a row property but as we saw 
in the previous version, this requires changes a bunch of other classes and 
interfaces, so let's keep it as a flag to Row.purge for now.

bq. Shadowable tombstone will be deprecated and use expired livenessInfo if the 
deletion time is greater than merged-row deletion to avoid uncessary expired 
livenessInfo.

LGTM, this will prevent using expired liveness info in most cases. We should 
probably remove this as part of CASSANDRA-13826 since we're overlading 
ExpiringLivenessInfo for a different purpose but it should be fine for now.

Final patch and CI results look good and I'm confident we've exercised all edge 
cases properly with unit and dtests. Committed dtests to master as 
{{6d77ace5361f020ba182072ade9f4ab98025c213}}. Commited to cassandra-3.0 branch 
as {{1b36740ebe66b8ed4c3d6cb64eb2419a9279dfbf}} and merged up to cassandra-3.11 
and trunk. Great job Zhao!

*Summary of improvements and fixed issues*

This patch address most outstanding timestamp problems with MVs on the 3.x 
series without changing the binary or storage protocol. The remaining issues 
which requires storage engine and binary protocol changes will be addressed on 
CASSANDRA-13826 on trunk. Below is a summary of the main changes and fixes this 
patch introduces.

{noformat}
* View same PK components as base

* DELETE of selected column should not affect out of order updates 
(testPartialDeleteSelectedColumn)
* DELETE of unselected column/collection should not affect ordered updates 
(testUpdateColumnNotInView, testPartialUpdateWithUnselectedCollections - 
CASSANDRA-13127)
* Unselected columns should keep view row alive when other columns expire 
(testUnselectedColumnsTTL - CASSANDRA-13127)

* Extra column on view PK

* View row should expire when view PK column expires in base 
(testUpdateColumnInViewPKWithTTL - CASSANDRA-13657)
* Commutative row deletion (testCommutativeRowDeletion, 
testCellTombstoneAndShadowableTombstones - CASSANDRA-13409)
* Out of order updates to extra column on view PK 
(testUpdateWithColumnTimestampBiggerThanPk, 
testUpdateWithColumnTimestampSmallerThanPk, CASSANDRA-11500)

* Unsupported scenarios

* DELETE of unselected column should not affect out of order updates 
(testPartialDeleteUnselectedColumn - CASSANDRA-13127 - Added )
* Filtering by non-PK base column (Only on C* 3.11+, disallowed on 
CASSANDRA-13798)

* Additional MV tests

* testNonBaseColumnInViewPk
* testRangeDeletion
* testStrictLivenessTombstone
* testFrozenCollectionsWithComplexInnerType
* testMVWithDifferentColumns
* testBaseTTLWithSameTimestampTest
{noformat}

*Summary of code changes*

{noformat}
1. Using expired livenessInfo if computed deletion time is greater than merged 
row deletion. There are only 2 cases:
  a. non-pk base column used in view pk is removed by partial update or 
partial delete
  b. unselected base column is removed by partial update or partial delete
   
   Current shadowable tombstone is not used to avoid the issue of resurrecting 
deleted cells (CASSANDRA-13049). We will expired-livenessInfo and merged base 
row deletion instead.

2. It's strict-liveness iff there is non-key base column in view-pk. The 
existence of view row is solely base on this non-key base column.

3. If there is no non-base PK column in view-pk, the view's liveness/deletion 
is using max of base livenessIno + unselected column. unselected column's ttl 
is used only when it affects view row liveness. Selected columns won't 
contribute to livenessInfo or row deletion.
* this wouldn't support complex cases as explained above. eg. c/d 
unselected, update c@10, delete c@11, update d@5. view row should be alive but 
dead - We added a note to NEWS.txt as well as documentation about this 
unsupported case.
* since we change the way the deletion timestamp is computed in this case, 
an older deletion to the non-base PK may not 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-01 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150303#comment-16150303
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

[~pauloricardomg] thanks for the feedback (y)

bq. I wasn't very comfortable with our previous approach of enforcing strict 
liveness during row merge, since it changes a lot of low-level 
structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a 
table-level setting. Since we'll probably get rid of this when doing a proper 
implementation of virtual cells , I updated on this commit to perform the 
filtering during read instead which will give us the same result but with less 
change in unrelated code. Do you see any problem with this approach?

As we discussed offline, we need to make sure the raw data including tombstone, 
expired liveness are shipped to the coordinator side.  Enforcing strict 
liveness in {{ReadCommand.executeLocally()}} would remove the row before digest 
or data response. Instead, we add {{enforceStrictLiveness}} to {{Row.purge}} to 
get the same result but less interfaces changes for {{Row}}.

bq. One problem of replacing shadowable tombstones by expired liveness info is 
that it stores an additional unused ttl field for every shadowed view entry to 
solve the commutative view deletion problem. In order to avoid this I updated 
the patch to only use expired ttl when a shadowable tombstone would not work 
along with an explanation on why that is used since it's a hack

Shadowable tombstone will be deprecated and use expired livenessInfo if the 
deletion time is greater than merged-row deletion to avoid uncessary expired 
livenessInfo.

bq. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

It's tested by  "testRangeDeletionWithFlush()" in ViewTest. Without partition 
deletion info from deletion tracker, existing row is given as empty and it will 
resurrect deleted cells.

bq.  In order to prevent against this, I added a note to the Upgrading section 
of NEWS.txt explaining about this caveat and that running repair before the 
upgrade should be sufficient to avoid it.

(y)

| source | unit | [dtest| 
| [trunk|https://github.com/jasonstack/cassandra/commits/trunk-11500-squashed] 
|  https://circleci.com/gh/jasonstack/cassandra/551 | x |
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.11]
 |  https://circleci.com/gh/jasonstack/cassandra/557 | x |
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.0]
 |  https://circleci.com/gh/jasonstack/cassandra/556|  x |
| [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]|
{code}
Changes:
1. Using expired livenessInfo if computed deletion time is greater than merged 
row deletion. There are only 2 cases:
  a. non-pk base column used in view pk is removed by partial update or 
partial delete
  b. unselected base column is removed by partial update or partial delete
   
   Current shadowable tombstone is not used to avoid the issue of resurrecting 
deleted cells. We will expired-livenessInfo and merged base row deletion 
instead. 

2. It's strict-liveness iff there is non-key base column in view-pk. The 
existence of view row is solely base on this non-key base column.

3. If there is no non-pk base column in view-pk, the view's liveness/deletion 
is using max of base livenessIno + unselected column. unselected column's ttl 
is used only when it affects view row liveness. Selected columns won't 
contribute to livenessInfo or row deletion.
* this wouldn't support complex cases as explained above. eg. c/d 
unselected, update c@10, delete c@11, update d@5. view row should be alive but 
dead

4. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

5. When generating read command to read existing base data, need to query all 
base columns instead of view's queried column if base and view having same key 
columns to read unselected column. 
{code}

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-08-30 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147044#comment-16147044
 ] 

Paulo Motta commented on CASSANDRA-11500:
-

Thanks for the update! See follow-up below:

bq. My understanding is: view row is strict iff the view has non-key base row 
as view pk. When it's Strict, the view's row liveness/deletion should use this 
non-key base column's timestamp as well as ttl, unless there is a greater row 
deletion.(It's like a simplified version of "VirtualCells" which only store 
metadata for non-key base column in view pk)

That's a good simplification of virtual cells which should allow us to fix the 
out-of-order update issues with non-base PK column 
(CASSANDRA-11500,CASSANDRA-13657).

I wasn't very comfortable with our previous approach of enforcing strict 
liveness during row merge, since it changes a lot of low-level 
structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a 
table-level setting. Since we'll probably get rid of this when doing a proper 
implementation of virtual cells , I updated [on this 
commit|https://github.com/pauloricardomg/cassandra/commit/2b7b7c8ca79b6ef72c7c92535096c6d1d899ee43]
 to perform the filtering during read instead which will give us the same 
result but with less change in unrelated code. Do you see any problem with this 
approach?

bq. this wouldn't support complex cases as explained above. eg. c/d unselected, 
update c@10, delete c@11, update d@5. view row should be alive but dead

Actually this is a bit worst than I initially thought, since unselected 
deletion can also shadow a previous insert (not only update) so I 
[added|https://github.com/pauloricardomg/cassandra/commit/4200db215bdd6a9338e5d16cc567444537104e4b]
 an additional test case to testPartialDeleteUnselectedColumn with this 
scenario. I also 
[added|https://github.com/pauloricardomg/cassandra/commit/623d5f6d935b57ad3c949206b85206aea15a8844]
 a note to both NEWS.txt and to the documentation explaining that it might be 
unsafe to perform deletion on unselected view column. I created CASSANDRA-13826 
to add proper support to this and other cases with storage engine changes.

bq. If there is no non-pk base column in view-pk, the view's liveness/deletion 
is using max of base livenessIno + unselected column. unselected column's ttl 
is used only when it affects view row liveness. Selected columns won't 
contribute to livenessInfo or row deletion.

In order to prevent drop of unselected column to keep view row alive, I added a 
new 
[commit|https://github.com/pauloricardomg/cassandra/commit/afef09233a9ee0657104319de4c8a4f76e9ad292]
 disallowing drop of base table columns until we can deal with this properly.

bq.  Current shadowable tombstone is not used to avoid the issue of 
resurrecting deleted cells. We will expired-livenessInfo instead.

One problem of replacing shadowable tombstones by expired liveness info is that 
it stores an additional unused ttl field for every shadowed view entry to solve 
the commutative view deletion problem. In order to avoid this I [updated the 
patch|https://github.com/pauloricardomg/cassandra/commit/e0da138ab10f6c0fc014de86fb251e11358d80cc]
 to only use expired ttl when a shadowable tombstone would not work along with 
an explanation on why that is used since it's a hack.

bq. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

Is this still required after enforcing strict liveness on the 
[PurgeFunction|https://github.com/pauloricardomg/cassandra/commit/2b7b7c8ca79b6ef72c7c92535096c6d1d899ee43#diff-5636ce30e505443b3e24a1a6ba55e476R112]?
 I removed this and also 
[simplified|https://github.com/apache/cassandra/commit/3188feb38fb63c5ca556a8273aadc84571ac1bb6]
 {{ViewUpdateGenerator.deleteOldEntry}} to always use shadowable deletion and 
it didn't seem to affect any test. Do you see any other case which could be 
affected by this?

Also, as we discussed offline, since we are changing the row liveness/deletion 
computation when the view has non-key base row as view PK, this means that an 
older update to this column may not generate a correct shadowable deletion in 
the view, as shown by the example below:
{noformat}
before upgrade

base:  (k=1@5)  a=1@1  b=1@1  c=1@1

view:  (k=1 && a=1 @5)  b=1@1  c=1@1(old way of computing livenessInfo 
timestamp, max of view pk columns)
___
after upgrade

update a = null @ 1

base:  (k=1@5)  a=null@1  b=1@1  c=1@1

view:
   (k=1 && a=1 @5)  b=1@1  c=1@1(old way of computing livenessInfo 
timestamp, max of view pk columns)
   (k=1 && a=1 @expired at 1)  (new way of computing 
livenessInfo timestamp, non-key base column in view pk dominates)
 
   new expired livenessInfo cannot shadow old view row if the removal 
time is less the previous max
{noformat}

Even though this should be a pretty 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-08-12 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124534#comment-16124534
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

Thanks for reviewing and feedback.

Changing the semantic of MV and revising non-key column filtering 
feature(CASSANDRA-10368) will indeed make it easier. It's a good idea to make a 
simple non-disruptive change to stabilize basic features and wait for more 
commiters involved.

Using an extended flag for {{Strict-Liveness}} will allow us to change to 
future structure easily, either multiple livenessInfos or virtualcells. 

About the {{Strict Liveness}} semantic:
* A strict row is only live iff it's row level liveness info is live, 
regardless of the liveness of its columns.

My understanding is: view row is strict iff the view has non-key base row as 
view pk. When it's {{Strict}}, the view's row liveness/deletion should use this 
non-key base column's timestamp as well as ttl, unless there is a greater row 
deletion.(It's like a simplified version of "VirtualCells" which only store 
metadata for non-key base column in view pk)

For now, the semantic of MV: 
* if it's strict(non-key base row as view pk), the existence of view row is 
only with its row livenessInfo
* if it's not-strict, view row is alive if there is any live selected view 
columns or live livenessInfo.

{code}
For 13127: 
   Unselected columns has no effect on liveness of view row, for now, till we 
are ready for new design.
   It cannot be properly supported without disruptive changes, like 
VirtualCells or multiple livenessInfos
{code}

{code}
For 13547:
It's necessary to forbid dropping filtered columns from base columns.
The filtered column part needs to be reconsidered with 10368.
It cannot be properly supported without disruptive changes, like 
VirtualCells or multiple livenessInfos
{code}

{code}
for 13409:
As paulo suggested, generating column tombstones when receiving a partial 
update for a previously deleted row might be a non-disruptive solution if cell 
tombstone can co-exist with row deletion which has greater timestamp.
I will reopen this ticket.
{code}

PATCH for 11500: 
| [trunk|https://github.com/jasonstack/cassandra/commits/11500-poc]|
| [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]| 

Changes:
1. deletion is shadowable if the non-key base column in view-pk is updated or 
deleted by partial update or partial delete. if this non-key column is removed 
by row deletion, it's not shadowable.
2. it's strict-liveness iff there is non-key base column in view-pk.
3. if it's not strict-liveness, the view's livenes/deletion is same as base 
row's.
4. in TableViews.java, the DeletionTracker should be applied even if one of the 
iterator has no data, eg. partition-deletion
5. sstabledump will include shadowable info


> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-08-09 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119823#comment-16119823
 ] 

Paulo Motta commented on CASSANDRA-11500:
-

Talking offline with Zhao, it seems like there is still an outstanding case 
derived from CASSANDRA-13547 not addressed by the strict liveness suggestion:

{code:none}
// liveness or deletion using max-timestamp of view-primary-key column in base
base:  (k), a, b, c
view:  (k, a), b, c=1

q1: insert (1,1,1,1) with timestamp 0

base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=1@0
view: liveness=ts@0,  (k=1, a=1), b=1@0, c=1@0

q2: update c=1 with timestamp 10 where k = 1  

base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=1@10
view: liveness=ts@0,  (k=1, a=1), b=1@0, c=1@10

q3: update c=2 with timestamp 11 where k = 1  

base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=2@11
view:
  liveness=ts@0,  (k=1, a=1), b=1@0, c=1@10
  tombstone=ts@0,  (k=1, a=1)

  with strict-liveness flag, view row is dead

q4: update c=1 with timestamp 12 where k = 1  

base: liveness=ts@0,  k=1, a=1@0, b=1@0, c=1@12
view:
  liveness=ts@0,  (k=1, a=1), b=1@0, c=1@10
  tombstone=ts@0,  (k=1, a=1)
  liveness=ts@0,  (k=1, a=1), b=1@0, c=1@12
 
  view row should be live..but it's dead
{code}

It seems like this scenario where the row liveness depend on a non-primary key 
was overlooked by CASSANDRA-10368 and seems to be analogous to the problem 
Tyler discovered on CASSANDRA-10226 (but with conditions rather than non-base 
view primary keys):

bq. It seems like when we include multiple non-PK columns in the view PK, we 
fundamentally have to accept that the view row's existence depends on multiple 
timestamps. I propose that we solve this by using a set of timestamps for the 
row's LivenessInfo.

The solution proposed on that ticket of keeping multiple deletion and liveness 
infos per primary key is similar to the virtual cells solution you 
independently came up (great job!). While I agree that a solution along those 
lines is the way to go moving forward, that's a pretty significant change in 
the storage engine which may introduce unforeseen problems, and would probably 
be nice to have [~slebresne] blessing given he seems to [feel 
strongly|https://issues.apache.org/jira/browse/CASSANDRA-10226?focusedCommentId=14740391=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14740391]
 about it and will likely want to chime in.

I personally think that before introducing disruptive changes to the storage 
engine and MV machinery to enable relatively new features (in this case, 
filtering on non-PK columns which didn't seem to have all of its repercussions 
considered on CASSANDRA-10368), we should take a conservative approach and 
spend our energy on stabilizing current MV features.

In practical terms, I'd suggest going with the simpler strict liveness approach 
I suggested above to fix the current problems (or any alternative which do not 
require disruptive changes on the storage engine) and disallow filtering on 
non-PK while the virtual cells are not implemented - MVs with it already 
enabled would not be affected but users would be susceptible to the problem 
above (we could maybe print a warning to inform this).

After we have current MV features stabilized we can then think of implementing 
the virtual cell idea to properly enable other features like filtering on 
non-PK columns and support multiple non-PK cols in MV clustering key when 
partition key is shared (CASSANDRA-10226).

Please let me know what do you think.

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-08-09 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119461#comment-16119461
 ] 

Paulo Motta commented on CASSANDRA-11500:
-

The virtual cell proposal is pretty clever and looks like it would solve most 
outstanding issues, but I'm a bit concerned about adding new structures to the 
storage engine to deal with materialized-specific issues.

While I agree we should do if it's our only choice, we should explore 
alternatives which reuse existing structures if possible to avoid introducing 
feature-specific stuff into the storage engine.

Looking back at the original scenario which motivated this ticket 
(CASSANDRA-11500):
{noformat}
CREATE TABLE t (k int PRIMARY KEY, a int, b int);
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT 
NULL PRIMARY KEY (k, a);

INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;

SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
(invalid) and the new one
{noformat}

It seems to me that the problem here is applying the definition for standard 
tables that if a single column is live, then the whole row is live, which does 
not need to be the case for MV where we can guarantee a view entry will always 
contain row-level liveness info.

We could solve this  by introducing a "strict " flag to the row liveness info, 
which has the following semantic:
- A strict row is only live iff it's row level liveness info is live, 
regardless of the liveness of its columns

Materialized views rows would have this flag set and perform deletions with its 
max primary key timestamp (instead of max timestamp of all keys), and this 
would solve the issue above by ensuring the row {{(1, 1)@(liveness@0, 
deleted@2)=(b=2@4)}} would not be live.

In addition to solving the original problem we would not create the second 
problem in the ticket description of updates to the view primary key with a 
smaller timestamp to be shadowed by a shadowable tombstone using the max 
timestamp of a non-PK column. In this approach the shadowing tombstone 
mechanism would still be orthogonal to the strict liveness and working as it is 
today.

This mechanism alone would not solve all other problems but at least the ones 
described in this ticket description with minimal change in the storage engine, 
let's now go through the other issues to see how we could solve them:

*View row expires too fast with unselected base column (CASSANDRA-13127)*

>From the discussion on CASSANDRA-13127 it seems like you found and fixed some 
>issues with liveness comparison in addition to no view update being generated 
>when there is an update to an unselected column which seems to solve this 
>issue in addition with the strict row concept above. Even though this will 
>require a read  when updating columns not in the view, the MV user is already 
>expecting to pay an extra price for MVs anyway so it shouldn't be a problem - 
>if you want performance you can build views manually or use CASSANDRA-9779 
>hopefully when it's ready. :-)

*based column used in view key is TTLed (CASSANDRA-13657)*

This seems to be fixed by the fix above.

*Partial update unselected columns in base (CASSANDRA-13127)*

This seem to be more of an anomaly of the partial row update semantics which 
has bad consequences for MVs than a problem with MV itself. 6.0 where thrift is 
gone is a good occasion to revisit this semantics rather than trying to make MV 
fit into it.

Right now, inserting a non-PK column (in the example {{UPDATE base USING 
TIMESTAMP 10 SET b=1 WHERE k=1 AND c=1}}) will create a row (k=1, c=1, b=1)@10 
on the end-user perspective ({{SELECT * FROM base WHERE k=1 and C=1}}) while 
internally creating only a column, which destroys the entire row in case the 
same non-PK column is removed ({{DELETE b FROM base USING TIMESTAMP 11 WHERE 
k=1 AND c=1;}}).

While this semantics may make sense in a column-oriented world, it's a tad 
bizarre in a row oriented world, given we can delete a row by simply unsetting 
a non-PK column. I think the correct and expected semantic would be: {{a column 
update to a non-existing row will create it}}.

This semantic is IMO what makes the most sense in a row oriented store and it's 
possible to implement it without falling into the unexpected/inconsistent 
behaviors of CASSANDRA-6782/CASSANDRA-6668 by adding some kind of 
{{CREATE_IF_NOT_EXISTS}} flag which basically keeps the oldest liveness entry 
if more than one is found with this flag when merging.

This semantic change would be the most correct while still preventing this 
anomaly with MVs due to the current semantic. If you agree we can create 
another ticket to propose this change.

*merging should be commutative (CASSANDRA-13409)*
The second case, represented by 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-26 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101603#comment-16101603
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

WIP 
[branch|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-cell]

Changed:
Extra "VirtualCell"(kind of special LivenessInfo for MV) in Row. It stores: 1. 
base column used in view PK and base column used in view filter conditions. if 
any of such column dead, entire view row dead, regardless LivenessInfo or 
DeletionTime status. 2. unselected base columns. if any of such column alive, 
view's pk should be alive if it's not deleted by DeletionTime or those columns 
in <1>. 

Todo: 
optimize in-memory/storage representation for "virtual-cells" to re-use 
AbstractRow/BTree
discard dead view row shadowed by virtualCells in compaction after gc_grace
more dtest
support collection type in unselected column (for now, it won't generate 
"virtual-cells" for non-frozen collection type)

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-20 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094272#comment-16094272
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

All livenessInfo or row deletion in MV will be ViewLivenessInfo or ViewDeletion 
with some extra details to check if view row is still alive.

Shadowable mechanism is not used..(single flag is not sufficient and in the 
proposal, we don't need to bring back the columns shadowed by 
shadowable-tombstone)

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-20 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094252#comment-16094252
 ] 

Kurt Greaves commented on CASSANDRA-11500:
--

Yep have been, was just hoping there was some code I could pin it to to make 
things clearer - as I'm sure you're aware it's hard to figure out all the edge 
cases unless you actively try them. Not a big deal, I'll keep an eye out for 
when you have a branch ready.

Your proposal looks good and seems to make sense and cover all the cases I can 
think of (but there are so many I'm sure I've forgotten some). With this change 
in place would all deletions/deletions in views be represented as a 
ViewTombstone? My understanding is that you're essentially combining normal 
tombstones and shadowables to create the viewtombstone, with a few extra 
details to catch the edge cases, is that right?

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-19 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094231#comment-16094231
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

[~KurtG] branch is not yet ready for you to test. but you could have a look at 
[proposal|https://issues.apache.org/jira/browse/CASSANDRA-11500?focusedCommentId=16082241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16082241]
 first, see if there is any missing case.

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-19 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094195#comment-16094195
 ] 

Kurt Greaves commented on CASSANDRA-11500:
--

[~jasonstack] do you have a WIP branch you can link here?

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-18 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091273#comment-16091273
 ] 

Kurt Greaves commented on CASSANDRA-11500:
--

Thanks [~jasonstack], I think that's a wise approach to solving the issues. 
I'll give it some thought over the next couple days. Worth giving some thought 
to [~fsander]'s solution as well, however I think I'm in favour of utilising 
ShadowableTombstones for the same case. Will give it some serious consideration 
before ruling anything out however.

Also, just making a note here that we have to solve the issue raised in 
CASSANDRA-10965 here as well. Pretty sure you're already addressing this but 
just saying so it's written down somewhere. 

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-17 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089856#comment-16089856
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

I plan to solve: partial update, ttl, co-existed shadowable tombstone, view 
timestamp tie all inside this ticket using extended shadowable 
approach(mentioned 
[here|https://issues.apache.org/jira/browse/CASSANDRA-11500?focusedCommentId=16082241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16082241]).
 Because all these issues require some storage format changes(extendedFlag), 
it's better to fix them and refactor in one commit.

Drafted a 
[patch|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-update-time]..(refactoring
 and adding more dtest)  

Any suggestions would be appreciated.



> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-12 Thread Fridtjof Sander (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083638#comment-16083638
 ] 

Fridtjof Sander commented on CASSANDRA-11500:
-

I also want to through my stuff into the ring right here :)

Following up from CASSANDRA-13657 on this, [my proposal from 
there|https://issues.apache.org/jira/browse/CASSANDRA-13657?focusedCommentId=16080177=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16080177]
 could be adapted to solve this by introducing a 
{{ShortCircuitingRowDeletion}}: A live, sort-circuiting row-deletion supersedes 
all columns no matter their timestamp. It can only be superseded by a 
liveness-info with higher timestamp.

That way, we don't need to take maximum timestamp of cells anymore for the 
view-tombstone, which prevents the conflict of this issue.

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-07-11 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082241#comment-16082241
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

h3. idea

{{ShadowableTombstone}} : deletion-time, isShadowable, and "viewKeyTs" aka. 
base column's ts which is part of view pk(used to reconcile when timestamp 
tie), if there is no timestamp associated with that column, use base pk 
timestamp instead.
{{ShadowableLivenessInfo}}:  timestamp, and "viewKeyTs"

When reconcile {{ShadowableTombstone}} and {{ShadowableLivenessInfo}}: 
{quote}
if deletion-time greater than timestamp, tombstone wins
if deletion-time smaller than timestamp, livenessInfo wins
when deletion-time ties with timestamp, 
 - if {{ShadowableTombstone}}'s {{viewKeyTs}} >= {{ShadowableLivenessInfo}}', 
then tombstone wins
 - else livesnessInfo wins.
{quote}

When inserting to view, always use the greatest timestamp of all base columns 
in view similar to how view deletion timestamp is computed.

h3. example

{quote}
CREATE TABLE t (k int PRIMARY KEY, a int, b int);
CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS NOT 
NULL PRIMARY KEY (k, a);

{{q1}} INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
{{q2}} UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
{{q3}} UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; 
{{q3}} UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; 
{quote}


* After {{q1}}:
** in base: {{k=1@0, a=1, b=1}}// 'k' is having value '1' with timestamp '0'
** in view: 
***  sstable1: {{(k=1&=1)@TS(0,0), b=1}}  // 'k:a' is having value '1:1' with 
timestamp '0' and viewKeyTs '0' from base's pk because column 'a' has no TS
* After {{q2}}
** in base(merged): {{k=1@0, a=1, b=2@10}} 
** in view:  
***  sstable1: {{(k=1&=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&=1)@TS(10,0), b=2@10}}
***  or merged: {{(k=1&=1)@TS(10,0), b=2@10}}
* After {{q3}}
** in base(merged): {{k=1@0, a=2@2, b=2@10}}  
** in view:  
***  sstable1: {{(k=1&=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&=1)@TS(10,0), b=2@10}}
***  sstable3: {{(k=1&=1)@Shadowable(10,0)}} & {{(k=1&=2)@TS(10,2), 
b=2@10}}  // '(k=1&=2)' is having biggest timestamp '10' and viewKeyTs '2' 
from column 'a'
***  or merged: {{(k=1&=2)@TS(10,2), b=2@10}}
* After {{q4}}
** in base(merged): {{k=1@0, a=1@3, b=2@10}}  
** in view:  
***  sstable1: {{(k=1&=1)@TS(0,0), b=1}}
***  sstable2: {{(k=1&=1)@TS(10,0), b=2@10}}
***  sstable3: {{(k=1&=1)@Shadowable(10,0)}} & {{(k=1&=2)@TS(10,2), 
b=2@10}} 
***  sstable4: {{(k=1&=2)@Shadowable(10,2)}} & {{(k=1&=1)@TS(10,3), 
b=2@10}}  // '(k=1&=1)' is having biggest timestamp '10' and viewKeyTs '3' 
from column 'a'
***  or merged: {{(k=1&=1)@TS(10,3), b=2@10}}



> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING 

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-06-28 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066749#comment-16066749
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

{quote}
we need to be able to re-insert an entry when a prior one had been deleted 
before, but we can't rely on timestamps being strictly bigger on the re-insert. 
In that sense, this can be though as a similar problem than CASSANDRA-10965, 
though the solution there of a single flag is not enough since we can have to 
replace more than once.
{quote}

Agreed.

How about shipping an extra "view-update-time" (`nowInSecond` that 
view-operation is triggered) per view row.  it will be used to check who's new 
when TS ties.

The `nowInSeconds` could still be the same in certain cases, but rare, similar 
to CASSANDRA-1096


> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sylvain Lebresne
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have other columns with higher timestamp 
> which won't be deleted as can easily shown by the failing of the following 
> test:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1;
> SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old 
> (invalid) and the new one
> {noformat}
> So the correct timestamp to use for the deletion is the biggest timestamp in 
> the old view entry (which we know since we read the pre-existing base row), 
> and that is what CASSANDRA-11475 does (the test above thus doesn't fail on 
> that branch).
> Unfortunately, even then we can still have problems if further updates 
> requires us to overide the old entry. Consider the following case:
> {noformat}
> CREATE TABLE t (k int PRIMARY KEY, a int, b int);
> CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS 
> NOT NULL PRIMARY KEY (k, a);
> INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0;
> UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1;
> UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the 
> entry for a=1 with timestamp 10
> UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert 
> an entry for a=1 but shouldn't be deleted by the prior deletion
> UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this 
> game more than once
> UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1;
> ...
> {noformat}
> In a way, this is saying that the "shadowable" deletion mechanism is not 
> general enough: we need to be able to re-insert an entry when a prior one had 
> been deleted before, but we can't rely on timestamps being strictly bigger on 
> the re-insert. In that sense, this can be though as a similar problem than 
> CASSANDRA-10965, though the solution there of a single flag is not enough 
> since we can have to replace more than once.
> I think the proper solution would be to ship enough information to always be 
> able to decide when a view deletion is shadowed. Which means that both 
> liveness info (for updates) and shadowable deletion would need to ship the 
> timestamp of any base table column that is part the view PK (so {{a}} in the 
> example below).  It's doable (and not that hard really), but it does require 
> a change to the sstable and intra-node protocol, which makes this a bit 
> painful right now.
> But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even 
> enough since on equal timestamp the value can be the deciding factor. So in 
> theory we'd have to ship the value of those columns (in the case of a 
> deletion at least since we have it in the view PK for updates). That said, on 
> that last problem, my preference would be that we start prioritizing 
> CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp 
> anymore, which would make this problem go away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: