Well this is all terribly interesting. I was actually going to get some discussion going about this during my talk, which unfortunately didn't happen, but I'll take this opportunity to push my agenda. My 99 cents:
*tl;dr: we should probably just focus on not releasing completely broken features in the first place, and we should do that through user engagement/testing wooo!* Some context to begin with, because I think this needs to be spelled out. Cassandra is a database. People treat databases as their prize possession. It stores all their sweet sweet data, and undoubtedly that data is the most important component in their system. Without it, there is no point in having a system. Users expect their databases to be the most stable component of their system, and generally they won't upgrade them without being absolutely positively sure that a new version will work at least exactly as the old one has. All our users treat their database in exactly this same way. Change happens slowly in the database world, and generally this is true both for the database and the users of the database. "C* 3.0.0 is out tomorrow! let's upgrade!" - said no one ever. Anyway, with that out of the way, back to the crux of the issue. This may get long and unwieldy, and derail the actual thread, but in this case I think for good reason. Either way it's all relevant to the actual topic. I think it's worth taking a step back and looking at the actual situation and what brought us here, rather than just proposing a solution that's really just a band-aid on the real issue. These are the problems I've seen that have caused a lot of the pain with new features, and an indication that we need to change the way we manage our releases and major changes. 1. We pushed out large feature sets with minimal testing of said features. At this stage we had no requirement for clean passing tests on commit, and over all we didn't have a strong commitment to writing tests either. In 3.10 this changed, where we put forth that dtests and utests needed to pass, and new tests needed to be written for each change. Any change prior to 3.10 was subject to many flaky tests with minimal coverage. Many features only went partially tested and were committed anyway. 2. We rushed features to meet deadlines, or simply didn't give them enough time + thought in the conception phase because of deadlines. I've never met an arbitrary deadline that made things better. From looking at lots of old tickets, there was "some" reason that even major changes had to be squeezed into 3.0 before it was released, which resulted in a lack of attention and testing for these features. We didn't just wait until things were ready before committing them, we just cut scope so it would fit. I honestly don't know how this could ever make sense for a volunteer driven project. In fact I don't really know how it works well for any software project. It generally just ends in bad software. It might make sense for a business pushing the feature agenda for $$, or where a projects users don't care about stability (lol), but it still results in bad software. It definitely doesn't make sense for an open source project. 3. We didn't do any system-wide verification/integration testing of features. We essentially relied on dtests and unit tests. Touched on this in 1, but we don't have much system testing. dtests kind of covers it, but not really well. cstar is also used in some cases but is also limited in scope (performance only, really). We're lucky that we can cover a lot of cases with dtests, but it seems to me that we don't capture a lot of the cases where feature X affects feature Y. E.g: the effect of repairs against everything ever, but mostly vnodes. We really need a proper testing cluster with each version we put out, and to test new and existing features extensively to measure their worth. Instaclustr is looking at this but we're still a ways off having something up and running. On this note we also changed defaults prematurely, but we wouldn't know it was premature until we did so, as if we didn't change the default they probably wouldn't have received much usage. 4. Our community is made up of mostly power users, and most of these are still on older versions (2.0, 2.1). There is little reason for these users to upgrade to newer versions, and little reason to use the new features (even if they were the ones developing them). This is actually great, that the power users have been adding functionality to Cassandra for new users, however we haven't really engaged with these users to go and verify this functionality, and we did a pretty half-arsed job of testing them ourselves. We essentially just rolled it out and waited for the bug reports. IMO this is where the "experimental flag" comes in. We rolled out a bunch of stuff, a year later some people started using it and realised it didn't quite work but they had already invested a lot of time into it, all of a sudden there is a world of issues and we realise we never should have rolled it out in the first place. It's tempting to just say "let's put in an experimental flag so this doesn't happen again and we'll be all G", but that won't actually fix the problem, it's much like the changing the defaults problem. Now, in a perfect world we would have the testing in place to not need an "experimental" flag, which I think is what we should actually aim for. In the mean time an experimental flag *may* be necessary, but so far I'm not really convinced. If we just mark a feature as experimental it will scare a lot of users off, and these new features will have a lot less coverage. Albeit there will be a lot less problems, but only because less people are using it. Especially with no indication of when it will actually be production ready. On that note, how do we even decide when it is production ready? It's bound to be something arbitrary like "we haven't seen a horrible bug in 6 months", which is no better than what we currently have. This sort of thing detracts from the usefulness of Cassandra, and gives nice big opportunities for someone to come along and do it better than us. I actually think a better solution here is more user engagement/testing in the release process. If there are users actually out there who want these features, they should be willing to help us test them prior to release. If each feature can get exposed to a few different use cases on real *staging* clusters, we could verify functionality a lot easier. This would have been cake with MV's, as there are many users managing their own views that could have just replaced them with MV's in their staging environment. This can be applied to a lot of other features as well (incremental repairs replace full repairs, SASI replace SI or even Solr), it just requires some buy-in from the userbase, which I'm sure we'd find, because if we didn't there would be no reason to write the feature in the first place. This would put us in a lot better position than an experimental flag, which would essentially require us to do this exact same thing in order to make a feature "production ready", however those experimental features may never end up getting the attention they need to become production ready. You could argue that if someone really wanted it then they'd push to get it out of an experimental state, but I think you'd find that most users will only consider what's readily available to them. And finally, back onto the original topic. I'm not convinced that MV's need this treatment now. Zhao and Paulo (and others+reviewers) have made quite a lot of fixes, granted there are still some outstanding bugs but the majority of bad ones have been fixed in 3.11.1 and 3.0.15, the remaining bugs mostly only affect views with a poor data model. Plus we've already required the known broken components require a flag to be turned on. Also at this point it's not worth making them experimental because a lot of users are already using them, it's a bit late to go and do that. We should just continue to try and fix them, or where not possible clearly document use cases that should be avoided. Frankly, marking features experimental that loads of users have already invested in feels to me a bit like a kick in the teeth to said users. Almost like telling them "we're actually not going to support this, surprise". If it's a big deal, we should probably just fix the issues. If anyone knows some really pressing issues I'm unaware of, feel free to fill me in. The only issue raised in this thread so far is a tool to repair consistency between view and base. While I think this is necessary, it really shouldn't be a major problem on the latest releases, and really, if the view loses consistency with the base, waiting for some kind of repair to fix it isn't much better than just rebuilding it from scratch. This is one case where we should document the possible causes of an inconsistent view, and the way to fix it (which is essentially, you had an outage, now you need to rebuild it), along with a warning about this in the docs. And to bring it all back to my initial comment about slow-moving databases and change and things... We've literally only just got stricter w.r.t testing in 3.10. We've hardly given 3.11 a go before coming along and saying "we need to make everything experimental so no one gets hurt!". Change is and should be slow in a database world, and science should be applied. At the very least, before we get too crazy, we should see if the changes to how we do testing have a positive effect on future features. This also comes back to the deadline situation I mentioned earlier. While we haven't formally changed how releases are scheduled/managed, we've informally moved to a strategy of "we'll have these problems solved before we do the next release". I think this will also be a huge improvement to the stability/production readiness of new features in 4.0. (ps: we should formalise that but that's a whole 'nother wall of text) Anyway, I have lots more to say on this and related topics but I see Josh is already raising one of my points against experimental flags now, and this is probably enough words for one email.