wall of text inc. *tl;dr: *Aiming to come to some conclusions about what we are doing with MV's and how we are going to make them stable in production. But really just trying to raise awareness/involvement for MV's.
It seems we've got an excess of MV bugs that pretty much make them completely unusable in production, or at least incredibly risky and also limited. It also appears that we don't have many people totally across MV's either (or at least a lack of people currently looking at them). To avoid us "forgetting" about MV's I'd like to raise the current issues and get opinions on the direction we should go with MV's. I know historically there was a lot of discussion about this, but it seems a lot of the originally involved are currently less involved, and thus before making wild changes to MV's it might be worth going back to the start and think through the original requirements and implementation. Probably worth summarising the original goals of MV's: - Maintain eventual consistency between base table and view tables - Provide mechanisms to repair consistency between base and views - Aim to keep convergence between base and view fast without sacrificing availability (low MTTR) Goals that weren't explicitly mentioned but more or less implied: - Performance must be at least good enough to justify using them over rolling-your-own. (we haven't really tried to measure this yet - only measured in comparison to not-a-MV) - Allow a user to redefine their partitioning key And also a quick summary of *some *of the limitations in our implementation (there are more, but majority of our current problems revolve around these): 1. Primary key of the base table must be included in the view, optionally one non-primary key column can be included in the view primary key. 2. All columns in the view primary key must be declared NOT NULL. 3. Base tables and views are one-to-one. That is, a *primary key* in a base maps to exactly one *primary key *in the view. Therefore you should never expect multiple rows in the view for a partition with multiple rows in the base. I've summarised the bulk of the outstanding bugs below (may have missed some), but notably it would be useful to get some decision-making happening on them. Fixing these bugs is a bit more involved and there is likely a few possible solutions and implications. Also they all pretty much touch the same parts of the code, so needs to be some collaboration across the patches (part of the reason I'm trying to bring more attention to them). CASSANDRA-13657 <https://issues.apache.org/jira/browse/CASSANDRA-13657> - Using a non-PK column in the view PK means that you can TTL that column in the base without TTLing the resulting view row. Potential solution is to change the definition of liveness info for view rows. This would probably work but makes moving away from the NOT NULL requirement on view PK's harder. Need to decide if that's what we want to do or if we pursue a different solution. CASSANDRA-13127 <https://issues.apache.org/jira/browse/CASSANDRA-13127> - Inserting with key with a TTL then updating the TTL on a column from the base that doesn't exist in the view doesn't update the liveness of the row in the MV, and thus the MV row expires before the base. The current proposed solution should work but will increase the amount of cases where we need to read the existing data. Needs some reviewing and wouldn't hurt to benchmark the changes. CASSANDRA-13547 <https://issues.apache.org/jira/browse/CASSANDRA-13547> - Being able to leave a column out of your SELECT but including it in the view filters causes some serious issues. Proposed fix is to force user to select all columns also included in where clause. This will potentially be a compatibility issue but *should *be fine as it only is checked on MV creation - so people upgrading shouldn't be affected (needs reviewing). Also another issue is addressed in the patch regarding timestamps - choice of timestamps led to rows not being deleted in the view. This comes back to the fact that we allow a non-PK column in the view PK. Needs more reviewing. Also related somewhat to 11500. CASSANDRA-13409 <https://issues.apache.org/jira/browse/CASSANDRA-13409> - Issues with shadowable tombstones. Has a patch but not sure if resolved based on Zhao's last comment. Another case of bringing data back in the view and thus making base and view inconsistent. Needs reviewing. CASSANDRA-11500 <https://issues.apache.org/jira/browse/CASSANDRA-11500> CASSANDRA-10965 <https://issues.apache.org/jira/browse/CASSANDRA-10965> - Both these appear to be instances of the same issue. Got a couple of potential solutions. Back to that problem of shadowable tombstones and timestamps. Pretty involved and would require an in depth review as decisions could greatly impact the complexity/usefulness of MV's. CASSANDRA-13069 <https://issues.apache.org/jira/browse/CASSANDRA-13069> - Node movements can cause inconsistencies. Paulo has written a patch but Sylvain has raised some concerns about our use of the local batchlog. Haven't confirmed myself but belief is that our eventual consistency guarantee is broken... :/ needs reviewing... CASSANDRA-12888 <https://issues.apache.org/jira/browse/CASSANDRA-12888> - Most people are probably aware of this one. Losing the repaired_at status for all MV streams as they are replayed through the write path. Has a potential solution in place for 4.x, but we need to commit to a work around for 3.11.x at least. CASSANDRA-12730 <https://issues.apache.org/jira/browse/CASSANDRA-12730> - This touches on some very common repair issues that we should probably look at, but I don't think it directly relates to MV's anymore. Might be worth removing the Materialized View component. (but this ticket probably still deserves a bit of attention). If anyone has been working on any of these tickets and no longer is able to, either update the ticket or let me know and I'll either take over/find some other poor soul to have a stab at it. It would also be nice to get some volunteers who are familiar with MV's to review the above tickets. Another thing I'm not sure of is that we are aiming to guarantee eventual consistency between base and view, however even with using the batchlog my understanding is we can't achieve this without some tool to synchronise the base with the view, however I don't think this tool currently exists and it seems like CASSANDRA-10346 <https://issues.apache.org/jira/browse/CASSANDRA-10346> agrees... Can anyone clarify if this is actually a requirement for eventual consistency? My general advice these days is for users to steer clear of MV's for the moment, however we have no clear plan for when these will really be stable. I think as some of the changes to fix MV's may potentially require a major version change, we should at least aim to get all those in for 4.0 (although still need to figure out what exactly these issues are). Interested to hear peoples thoughts.