[jira] [Commented] (CASSANDRA-13759) remove collection creating on normal pending repair check
[ https://issues.apache.org/jira/browse/CASSANDRA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124797#comment-16124797 ] Jeff Jirsa commented on CASSANDRA-13759: I suspect you've dropped the {{if}} clause that checks for multiple ids among the set of sstables, and I'm not sure we need to use a {{Set}} at all here. Will push up a proposed change in the morning when I'm at a real laptop. > remove collection creating on normal pending repair check > - > > Key: CASSANDRA-13759 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13759 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13759.txt > > > in normal case where there's only 1 sstable for repair, don't build id > collection just used for error logging -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13759) remove collection creating on normal pending repair check
[ https://issues.apache.org/jira/browse/CASSANDRA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13759: --- Reviewer: Jeff Jirsa Status: Patch Available (was: Open) Pushing to CI while I review: [utests|https://circleci.com/gh/jeffjirsa/cassandra/tree/cassandra-13759] [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/180/] > remove collection creating on normal pending repair check > - > > Key: CASSANDRA-13759 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13759 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13759.txt > > > in normal case where there's only 1 sstable for repair, don't build id > collection just used for error logging -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13760) presize collections
[ https://issues.apache.org/jira/browse/CASSANDRA-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124793#comment-16124793 ] Jeff Jirsa commented on CASSANDRA-13760: Pushing to CI while I review: [utest|https://circleci.com/gh/jeffjirsa/cassandra/tree/cassandra-13760] [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/179/] > presize collections > --- > > Key: CASSANDRA-13760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13760 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13760.txt > > > presize collections where sane, to avoid reallocs, or excess garbage -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13760) presize collections
[ https://issues.apache.org/jira/browse/CASSANDRA-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13760: --- Status: Patch Available (was: Open) > presize collections > --- > > Key: CASSANDRA-13760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13760 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13760.txt > > > presize collections where sane, to avoid reallocs, or excess garbage -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13760) presize collections
[ https://issues.apache.org/jira/browse/CASSANDRA-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13760: --- Reviewer: Jeff Jirsa > presize collections > --- > > Key: CASSANDRA-13760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13760 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13760.txt > > > presize collections where sane, to avoid reallocs, or excess garbage -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13760) presize collections
[ https://issues.apache.org/jira/browse/CASSANDRA-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Brosius updated CASSANDRA-13760: - Attachment: 13760.txt > presize collections > --- > > Key: CASSANDRA-13760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13760 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13760.txt > > > presize collections where sane, to avoid reallocs, or excess garbage -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13760) presize collections
Dave Brosius created CASSANDRA-13760: Summary: presize collections Key: CASSANDRA-13760 URL: https://issues.apache.org/jira/browse/CASSANDRA-13760 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dave Brosius Assignee: Dave Brosius Priority: Trivial Fix For: 4.x presize collections where sane, to avoid reallocs, or excess garbage -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13759) remove collection creating on normal pending repair check
[ https://issues.apache.org/jira/browse/CASSANDRA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Brosius updated CASSANDRA-13759: - Attachment: 13759.txt > remove collection creating on normal pending repair check > - > > Key: CASSANDRA-13759 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13759 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Dave Brosius >Assignee: Dave Brosius >Priority: Trivial > Fix For: 4.x > > Attachments: 13759.txt > > > in normal case where there's only 1 sstable for repair, don't build id > collection just used for error logging -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13759) remove collection creating on normal pending repair check
Dave Brosius created CASSANDRA-13759: Summary: remove collection creating on normal pending repair check Key: CASSANDRA-13759 URL: https://issues.apache.org/jira/browse/CASSANDRA-13759 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Dave Brosius Assignee: Dave Brosius Priority: Trivial Fix For: 4.x in normal case where there's only 1 sstable for repair, don't build id collection just used for error logging -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-11500: - Reviewer: Paulo Motta Status: Patch Available (was: In Progress) > Obsolete MV entry may not be properly deleted > - > > Key: CASSANDRA-11500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11500 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Sylvain Lebresne >Assignee: ZhaoYang > > When a Materialized View uses a non-PK base table column in its PK, if an > update changes that column value, we add the new view entry and remove the > old one. When doing that removal, the current code uses the same timestamp > than for the liveness info of the new entry, which is the max timestamp for > any columns participating to the view PK. This is not correct for the > deletion as the old view entry could have other columns with higher timestamp > which won't be deleted as can easily shown by the failing of the following > test: > {noformat} > CREATE TABLE t (k int PRIMARY KEY, a int, b int); > CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS > NOT NULL PRIMARY KEY (k, a); > INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0; > UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1; > UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; > SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old > (invalid) and the new one > {noformat} > So the correct timestamp to use for the deletion is the biggest timestamp in > the old view entry (which we know since we read the pre-existing base row), > and that is what CASSANDRA-11475 does (the test above thus doesn't fail on > that branch). > Unfortunately, even then we can still have problems if further updates > requires us to overide the old entry. Consider the following case: > {noformat} > CREATE TABLE t (k int PRIMARY KEY, a int, b int); > CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS > NOT NULL PRIMARY KEY (k, a); > INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0; > UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1; > UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the > entry for a=1 with timestamp 10 > UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert > an entry for a=1 but shouldn't be deleted by the prior deletion > UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this > game more than once > UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1; > ... > {noformat} > In a way, this is saying that the "shadowable" deletion mechanism is not > general enough: we need to be able to re-insert an entry when a prior one had > been deleted before, but we can't rely on timestamps being strictly bigger on > the re-insert. In that sense, this can be though as a similar problem than > CASSANDRA-10965, though the solution there of a single flag is not enough > since we can have to replace more than once. > I think the proper solution would be to ship enough information to always be > able to decide when a view deletion is shadowed. Which means that both > liveness info (for updates) and shadowable deletion would need to ship the > timestamp of any base table column that is part the view PK (so {{a}} in the > example below). It's doable (and not that hard really), but it does require > a change to the sstable and intra-node protocol, which makes this a bit > painful right now. > But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even > enough since on equal timestamp the value can be the deciding factor. So in > theory we'd have to ship the value of those columns (in the case of a > deletion at least since we have it in the view PK for updates). That said, on > that last problem, my preference would be that we start prioritizing > CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp > anymore, which would make this problem go away. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-11500: - Fix Version/s: 4.x 3.11.x 3.0.x > Obsolete MV entry may not be properly deleted > - > > Key: CASSANDRA-11500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11500 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Sylvain Lebresne >Assignee: ZhaoYang > Fix For: 3.0.x, 3.11.x, 4.x > > > When a Materialized View uses a non-PK base table column in its PK, if an > update changes that column value, we add the new view entry and remove the > old one. When doing that removal, the current code uses the same timestamp > than for the liveness info of the new entry, which is the max timestamp for > any columns participating to the view PK. This is not correct for the > deletion as the old view entry could have other columns with higher timestamp > which won't be deleted as can easily shown by the failing of the following > test: > {noformat} > CREATE TABLE t (k int PRIMARY KEY, a int, b int); > CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS > NOT NULL PRIMARY KEY (k, a); > INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0; > UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1; > UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; > SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old > (invalid) and the new one > {noformat} > So the correct timestamp to use for the deletion is the biggest timestamp in > the old view entry (which we know since we read the pre-existing base row), > and that is what CASSANDRA-11475 does (the test above thus doesn't fail on > that branch). > Unfortunately, even then we can still have problems if further updates > requires us to overide the old entry. Consider the following case: > {noformat} > CREATE TABLE t (k int PRIMARY KEY, a int, b int); > CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS > NOT NULL PRIMARY KEY (k, a); > INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0; > UPDATE t USING TIMESTAMP 10 SET b = 2 WHERE k = 1; > UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; // This will delete the > entry for a=1 with timestamp 10 > UPDATE t USING TIMESTAMP 3 SET a = 1 WHERE k = 1; // This needs to re-insert > an entry for a=1 but shouldn't be deleted by the prior deletion > UPDATE t USING TIMESTAMP 4 SET a = 2 WHERE k = 1; // ... and we can play this > game more than once > UPDATE t USING TIMESTAMP 5 SET a = 1 WHERE k = 1; > ... > {noformat} > In a way, this is saying that the "shadowable" deletion mechanism is not > general enough: we need to be able to re-insert an entry when a prior one had > been deleted before, but we can't rely on timestamps being strictly bigger on > the re-insert. In that sense, this can be though as a similar problem than > CASSANDRA-10965, though the solution there of a single flag is not enough > since we can have to replace more than once. > I think the proper solution would be to ship enough information to always be > able to decide when a view deletion is shadowed. Which means that both > liveness info (for updates) and shadowable deletion would need to ship the > timestamp of any base table column that is part the view PK (so {{a}} in the > example below). It's doable (and not that hard really), but it does require > a change to the sstable and intra-node protocol, which makes this a bit > painful right now. > But I'll also note that as CASSANDRA-1096 shows, the timestamp is not even > enough since on equal timestamp the value can be the deciding factor. So in > theory we'd have to ship the value of those columns (in the case of a > deletion at least since we have it in the view PK for updates). That said, on > that last problem, my preference would be that we start prioritizing > CASSANDRA-6123 seriously so we don't have to care about conflicting timestamp > anymore, which would make this problem go away. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124534#comment-16124534 ] ZhaoYang commented on CASSANDRA-11500: -- Thanks for reviewing and feedback. Changing the semantic of MV and revising non-key column filtering feature(CASSANDRA-10368) will indeed make it easier. It's a good idea to make a simple non-disruptive change to stabilize basic features and wait for more commiters involved. Using an extended flag for {{Strict-Liveness}} will allow us to change to future structure easily, either multiple livenessInfos or virtualcells. About the {{Strict Liveness}} semantic: * A strict row is only live iff it's row level liveness info is live, regardless of the liveness of its columns. My understanding is: view row is strict iff the view has non-key base row as view pk. When it's {{Strict}}, the view's row liveness/deletion should use this non-key base column's timestamp as well as ttl, unless there is a greater row deletion.(It's like a simplified version of "VirtualCells" which only store metadata for non-key base column in view pk) For now, the semantic of MV: * if it's strict(non-key base row as view pk), the existence of view row is only with its row livenessInfo * if it's not-strict, view row is alive if there is any live selected view columns or live livenessInfo. {code} For 13127: Unselected columns has no effect on liveness of view row, for now, till we are ready for new design. It cannot be properly supported without disruptive changes, like VirtualCells or multiple livenessInfos {code} {code} For 13547: It's necessary to forbid dropping filtered columns from base columns. The filtered column part needs to be reconsidered with 10368. It cannot be properly supported without disruptive changes, like VirtualCells or multiple livenessInfos {code} {code} for 13409: As paulo suggested, generating column tombstones when receiving a partial update for a previously deleted row might be a non-disruptive solution if cell tombstone can co-exist with row deletion which has greater timestamp. I will reopen this ticket. {code} PATCH for 11500: | [trunk|https://github.com/jasonstack/cassandra/commits/11500-poc]| | [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]| Changes: 1. deletion is shadowable if the non-key base column in view-pk is updated or deleted by partial update or partial delete. if this non-key column is removed by row deletion, it's not shadowable. 2. it's strict-liveness iff there is non-key base column in view-pk. 3. if it's not strict-liveness, the view's livenes/deletion is same as base row's. 4. in TableViews.java, the DeletionTracker should be applied even if one of the iterator has no data, eg. partition-deletion 5. sstabledump will include shadowable info > Obsolete MV entry may not be properly deleted > - > > Key: CASSANDRA-11500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11500 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Sylvain Lebresne >Assignee: ZhaoYang > > When a Materialized View uses a non-PK base table column in its PK, if an > update changes that column value, we add the new view entry and remove the > old one. When doing that removal, the current code uses the same timestamp > than for the liveness info of the new entry, which is the max timestamp for > any columns participating to the view PK. This is not correct for the > deletion as the old view entry could have other columns with higher timestamp > which won't be deleted as can easily shown by the failing of the following > test: > {noformat} > CREATE TABLE t (k int PRIMARY KEY, a int, b int); > CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS > NOT NULL PRIMARY KEY (k, a); > INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0; > UPDATE t USING TIMESTAMP 4 SET b = 2 WHERE k = 1; > UPDATE t USING TIMESTAMP 2 SET a = 2 WHERE k = 1; > SELECT * FROM mv WHERE k = 1; // This currently return 2 entries, the old > (invalid) and the new one > {noformat} > So the correct timestamp to use for the deletion is the biggest timestamp in > the old view entry (which we know since we read the pre-existing base row), > and that is what CASSANDRA-11475 does (the test above thus doesn't fail on > that branch). > Unfortunately, even then we can still have problems if further updates > requires us to overide the old entry. Consider the following case: > {noformat} > CREATE TABLE t (k int PRIMARY KEY, a int, b int); > CREATE MATERIALIZED VIEW mv AS SELECT * FROM t WHERE k IS NOT NULL AND a IS > NOT NULL PRIMARY KEY (k, a); > INSERT INTO t(k, a, b) VALUES (1, 1, 1) USING TIMESTAMP 0; > UPDATE t USING TIMESTAMP 10 SET b = 2