[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922466#comment-16922466 ] Nadav Har'El commented on CASSANDRA-9928: - This issue has recently turned 4 years old, and I'm curious how sure we are about the *reasons* described above for why we forbid MV with two new key columns - whether these reasons are correct, and whether we are sure these are the only reasons. As [~fsander] asked above, while a base-view inconsistency is indeed more likely in the two-new-key-columns case, don't we have the same problem in the regular one-new-key-column case - of scenarios where an unfortunate order of node failures cause data to appear in a view replica which doesn't appear in the base replica, and thus will never be deleted? I thought this was one of the main reasons why MV was recently downgraded to "experimental" status. But I also wonder if we didn't miss a second problem, that of row liveness, similar to what we have in the case of unselected columns (see [CASSANDRA-13826)|https://jira.apache.org/jira/browse/CASSANDRA-13826)] where if we add and remove different base columns which are view keys, but the view row has just a *single* timestamp, we can end up being unable to add a view row that we previously deleted. For example, here is a scenario I thought might be problematic (didn't actually test this, one would need to disable the check in the code forbidding multiple new MV key columns to run a test case): Assume that x,y are regular column in base, but key columns in the view. For brevity, we leave out other base key columns and other regular columns. Consider the following sequence of events on one row of the base table: # Add x=1 at timestamp 1. Since y is still null, no view row is created yet. # Add y=1 at timestamp 10. This creates a view row with key x=1, y=1. The row only contains a CQL row marker, and a single timestamp is chosen for it: 10. # Delete x at timestamp 2. This deletes x’s older (ts=1) value, and so the view row should be deleted. Again, a timestamp needs to be chosen for this deletion - it will be 10 again, and the deletion will override the creation with the same timestamp from the previous step, and so far everything is fine. # Add x=2 at timestamp 3. This overrides the deletion of x (which was in timestamp 2) so again, both x and y have values and a view row should be created with key x=2, y=1. However, this creation will again have timestamp 10 (y’s timestamp) and not be able to shadow the deletion from step 3 (in step 3, deletion won over data, so here it will win again). So the view row we wanted to add will not be added! > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Materialized Views >Reporter: T Jake Luciani >Priority: Normal > Labels: materializedviews > Fix For: 4.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088582#comment-16088582 ] Fridtjof Sander commented on CASSANDRA-9928: I seem to be the only one, who doesn't understanding where the actual difference to the single-column case is: Consider {{(p=1, a=1)}} with an index-MV on {{a}} and two updates {{a=2}} and {{a=3}}. One base-replica receives {{a=2}}, deletes view entry {{(a=1, p=1)}} and inserts {{(a=2, p=1)}}, then dies. Other base-replicas get {{a=3}}, delete {{(a=1, p=1)}} and insert {{(a=3, p=1)}}. Now, how is {{(a=2, p=1)}} removed from the view replica that was paired with the dying base-node? I don't get what's different here. Or does my analog case miss the point? > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 4.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984667#comment-15984667 ] craig mcmillan commented on CASSANDRA-9928: --- currently i achieve this function by manually concatenating the extra keys i want in the MV into a single text key - it's roughly workable, but timeuuids can no longer be used to provide ordering, since they don't sort lexically [~thobbs] solution would formalize and improve upon what i, and presumably many others, are already having to do ? > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.11.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588847#comment-15588847 ] Ariel Scarpinelli commented on CASSANDRA-9928: -- Why not letting people decide? If you implement [~thobbs] solution then people simply gets warned: "non-PK columns participating in MV PKs will need to be updated together". Then it becomes the user responsibility to choose if they prefer to be tied to that restriction, or use a single column (which is tied to that restriction anyway, but since you always update columns in a minimum set of 1 ... :-D) . The current way it is implemented you are not letting people with other choice than fabricating a "fake" column that concatenates values or so... which effectively translates in having to update in tandem anyways but also adding complexity and repeated data. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526654#comment-15526654 ] Donovan Hsieh commented on CASSANDRA-9928: -- Whatever technical issues associated with race condition stated above and limit to just 1 non-PK column, imho, make MV seriously handicapped. If this limitation is not removed, I can't see any serious real world applications can use MV effectively. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341741#comment-15341741 ] T Jake Luciani commented on CASSANDRA-9928: --- bq. If we can guarantee that there is a limit on the number of changes to those columns than we can limit the number of distinct state permutations that may need to be considered in the scenarios above. How do you propose we limit changes across the cluster and DCs? Tyler's suggestion is easy to guarantee without introducing some global rate limiting. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341188#comment-15341188 ] Matthias Broecheler commented on CASSANDRA-9928: Requiring that all non-PK columns that are indexed be updated at the same time is too much of a limitation and will be really hard for users to understand imho. Instead, I would propose that we introduce a rate-change limit on the columns that participate in an MV. If we can guarantee that there is a limit on the number of changes to those columns than we can limit the number of distinct state permutations that may need to be considered in the scenarios above. In those cases, we would simply enumerate them and then delete all possible old MV states.This sounds expensive but it would only be expensive when change happen in fast succession or under extraordinary operational conditions - i.e. the cost should amortize well. As for the rate limit, it seems that this would be a rather arbitrary limitation but if somebody changes their MV columns in rapid succession then they are pursuing an anti-pattern and throwing an exception would be a better response that deteriorating system performance. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136898#comment-15136898 ] Aleksey Yeschenko commented on CASSANDRA-9928: -- [~mbroecheler] Is the limitation outlined by Tyler above still compatible with the use case you got in mind? > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965509#comment-14965509 ] Tyler Hobbs commented on CASSANDRA-9928: One possible solution is to require that all non-PK columns that are in a view PK be updated simultaneously. [~tjake] mentioned possible problems from read repair, but it seems like with this restriction in place, any read repairs would end up repairing all non-PK columns at once. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723556#comment-14723556 ] T Jake Luciani commented on CASSANDRA-9928: --- This scenario where 3 nodes won't see each others updates can't happen if we use the coordinator batchlog, since we guarantee at least a quorum of nodes will see the updates. Mentioning this for CASSANDRA-10230 > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723703#comment-14723703 ] T Jake Luciani commented on CASSANDRA-9928: --- of course! thx. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723695#comment-14723695 ] Joel Knighton commented on CASSANDRA-9928: -- Guaranteeing a quorum of nodes will see the updates does not solve the problem because supporting multiple non-primary key columns in the materialized view primary key introduces a sensitivity to the ordering of updates to these non-primary key columns. I think this is the simplest version of Benedict's example. Envision a cluster with a table with primary key P and columns A and B. Presently, all replicas contain an entry for P=1, A=1, B=1. Two concurrent updates are occurring - one setting A=2, and one setting B=2. One replica receives the update B=2, removes the MV entry for P=1, A=1, B=1, creates an MV entry for P=1, A=1, B=2, and then crashes with data loss. The remainder of the base replicas receive the update A=2; remove the MV entry for P=1, A=1, B=1; create an MV entry for P=1, A=2, B=1; receive the update B=2; remove the MV entry for P=1, A=2, B=1; and create an MV entry for P=1, A=2, B=2. Upon repairing the data lost base replica from the remaining base replicas, a delete for the entry P=1, A=1, B=2 in the paired replica will never be created. > Add Support for multiple non-primary key columns in Materialized View primary > keys > -- > > Key: CASSANDRA-9928 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani > Labels: materializedviews > Fix For: 3.x > > > Currently we don't allow > 1 non primary key from the base table in a MV > primary key. We should remove this restriction assuming we continue > filtering out nulls. With allowing nulls in the MV columns there are a lot > of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys
[ https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651989#comment-14651989 ] Carl Yeksigian commented on CASSANDRA-9928: --- [~benedict] brought up some potential issues in [a comment|https://issues.apache.org/jira/browse/CASSANDRA-6477#MultipleColumns] on CASSANDRA-6477: {quote} As far as multiple columns are concerned: I think we may need to go back to the drawing board there. It's actually really easy to demonstrate the cluster getting into broken states. Say you have three columns, A B C, and you send three competing updates a b c to their respective columns; previously all held the value _. If they arrive in different orders on each base-replica we can end up with 6 different MV states around the cluster. If any base replica dies, you don't know which of those 6 intermediate states were taken (and probably replicated) by its MV replicas. This problem grows exponentially as you add competing updates (which, given split brain, can compete over arbitrarily long intervals). This is where my concern about a single (base) node dependency comes in, but after consideration it's clear that with a single column this problem is avoided because it's never ambiguous what the old state was. If you encounter a mutation that is shadowed by your current data, you can always issue a delete for the correct prior state. With multiple columns that is no longer possible. I'm pretty sure the presence of multiple columns introduces other issues with each of the other moving parts. {quote} When we implement this feature, we should make sure to also add jepsen tests for the possible problems. Add Support for multiple non-primary key columns in Materialized View primary keys -- Key: CASSANDRA-9928 URL: https://issues.apache.org/jira/browse/CASSANDRA-9928 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Labels: materializedviews Fix For: 3.0 beta 1 Currently we don't allow 1 non primary key from the base table in a MV primary key. We should remove this restriction assuming we continue filtering out nulls. With allowing nulls in the MV columns there are a lot of multiplicative implications we need to think through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)