Well this is all terribly interesting. I was actually going to get some
discussion going about this during my talk, which unfortunately didn't
happen, but I'll take this opportunity to push my agenda. My 99 cents:

*tl;dr: we should probably just focus on not releasing completely broken
features in the first place, and we should do that through user
engagement/testing wooo!*

Some context to begin with, because I think this needs to be spelled out.
Cassandra is a database. People treat databases as their prize possession.
It stores all their sweet sweet data, and undoubtedly that data is the most
important component in their system. Without it, there is no point in
having a system. Users expect their databases to be the most stable
component of their system, and generally they won't upgrade them without
being absolutely positively sure that a new version will work at least
exactly as the old one has. All our users treat their database in exactly
this same way. Change happens slowly in the database world, and generally
this is true both for the database and the users of the database. "C* 3.0.0
is out tomorrow! let's upgrade!" - said no one ever.

Anyway, with that out of the way, back to the crux of the issue. This may
get long and unwieldy, and derail the actual thread, but in this case I
think for good reason. Either way it's all relevant to the actual topic.

I think it's worth taking a step back and looking at the actual situation
and what brought us here, rather than just proposing a solution that's
really just a band-aid on the real issue. These are the problems I've seen
that have caused a lot of the pain with new features, and an indication
that we need to change the way we manage our releases and major changes.

   1. We pushed out large feature sets with minimal testing of said
   features. At this stage we had no requirement for clean passing tests on
   commit, and over all we didn't have a strong commitment to writing tests
   either. In 3.10 this changed, where we put forth that dtests and utests
   needed to pass, and new tests needed to be written for each change. Any
   change prior to 3.10 was subject to many flaky tests with minimal coverage.
   Many features only went partially tested and were committed anyway.

   2. We rushed features to meet deadlines, or simply didn't give them
   enough time + thought in the conception phase because of deadlines.
   I've never met an arbitrary deadline that made things better. From
   looking at lots of old tickets, there was "some" reason that even major
   changes had to be squeezed into 3.0 before it was released, which resulted
   in a lack of attention and testing for these features. We didn't just wait
   until things were ready before committing them, we just cut scope so it
   would fit. I honestly don't know how this could ever make sense for a
   volunteer driven project. In fact I don't really know how it works well for
   any software project. It generally just ends in bad software. It might make
   sense for a business pushing the feature agenda for $$, or where a projects
   users don't care about stability (lol), but it still results in bad
   software. It definitely doesn't make sense for an open source project.

   3. We didn't do any system-wide verification/integration testing of
   features. We essentially relied on dtests and unit tests. Touched on this
   in 1, but we don't have much system testing. dtests kind of covers it, but
   not really well. cstar is also used in some cases but is also limited in
   scope (performance only, really). We're lucky that we can cover a lot of
   cases with dtests, but it seems to me that we don't capture a lot of the
   cases where feature X affects feature Y. E.g: the effect of repairs against
   everything ever, but mostly vnodes. We really need a proper testing cluster
   with each version we put out, and to test new and existing features
   extensively to measure their worth. Instaclustr is looking at this but
   we're still a ways off having something up and running.
   On this note we also changed defaults prematurely, but we wouldn't know
   it was premature until we did so, as if we didn't change the default they
   probably wouldn't have received much usage.

   4. Our community is made up of mostly power users, and most of these are
   still on older versions (2.0, 2.1). There is little reason for these users
   to upgrade to newer versions, and little reason to use the new features
   (even if they were the ones developing them). This is actually great, that
   the power users have been adding functionality to Cassandra for new users,
   however we haven't really engaged with these users to go and verify this
   functionality, and we did a pretty half-arsed job of testing them
   ourselves. We essentially just rolled it out and waited for the bug reports.
   IMO this is where the "experimental flag" comes in. We rolled out a
   bunch of stuff, a year later some people started using it and realised it
   didn't quite work but they had already invested a lot of time into it, all
   of a sudden there is a world of issues and we realise we never should have
   rolled it out in the first place. It's tempting to just say "let's put in
   an experimental flag so this doesn't happen again and we'll be all G", but
   that won't actually fix the problem, it's much like the changing the
   defaults problem.

Now, in a perfect world we would have the testing in place to not need an
"experimental" flag, which I think is what we should actually aim for. In
the mean time an experimental flag *may* be necessary, but so far I'm not
really convinced. If we just mark a feature as experimental it will scare a
lot of users off, and these new features will have a lot less coverage.
Albeit there will be a lot less problems, but only because less people are
using it. Especially with no indication of when it will actually be
production ready. On that note, how do we even decide when it is production
ready? It's bound to be something arbitrary like "we haven't seen a
horrible bug in 6 months", which is no better than what we currently have.
This sort of thing detracts from the usefulness of Cassandra, and gives
nice big opportunities for someone to come along and do it better than us.

I actually think a better solution here is more user engagement/testing in
the release process. If there are users actually out there who want these
features, they should be willing to help us test them prior to release. If
each feature can get exposed to a few different use cases on real
*staging* clusters,
we could verify functionality a lot easier. This would have been cake with
MV's, as there are many users managing their own views that could have just
replaced them with MV's in their staging environment. This can be applied
to a lot of other features as well (incremental repairs replace full
repairs, SASI replace SI or even Solr), it just requires some buy-in from
the userbase, which I'm sure we'd find, because if we didn't there would be
no reason to write the feature in the first place. This would put us in a
lot better position than an experimental flag, which would essentially
require us to do this exact same thing in order to make a feature
"production ready", however those experimental features may never end up
getting the attention they need to become production ready. You could argue
that if someone really wanted it then they'd push to get it out of an
experimental state, but I think you'd find that most users will only
consider what's readily available to them.

And finally, back onto the original topic. I'm not convinced that MV's need
this treatment now. Zhao and Paulo (and others+reviewers) have made quite a
lot of fixes, granted there are still some outstanding bugs but the
majority of bad ones have been fixed in 3.11.1 and 3.0.15, the remaining
bugs mostly only affect views with a poor data model. Plus we've already
required the known broken components require a flag to be turned on. Also
at this point it's not worth making them experimental because a lot of
users are already using them, it's a bit late to go and do that. We should
just continue to try and fix them, or where not possible clearly document
use cases that should be avoided.

Frankly, marking features experimental that loads of users have already
invested in feels to me a bit like a kick in the teeth to said users.
Almost like telling them "we're actually not going to support this,
surprise". If it's a big deal, we should probably just fix the issues. If
anyone knows some really pressing issues I'm unaware of, feel free to fill
me in. The only issue raised in this thread so far is a tool to repair
consistency between view and base. While I think this is necessary, it
really shouldn't be a major problem on the latest releases, and really, if
the view loses consistency with the base, waiting for some kind of repair
to fix it isn't much better than just rebuilding it from scratch. This is
one case where we should document the possible causes of an inconsistent
view, and the way to fix it (which is essentially, you had an outage, now
you need to rebuild it), along with a warning about this in the docs.

And to bring it all back to my initial comment about slow-moving databases
and change and things... We've literally only just got stricter w.r.t
testing in 3.10. We've hardly given 3.11 a go before coming along and
saying "we need to make everything experimental so no one gets hurt!".
Change is and should be slow in a database world, and science should be
applied. At the very least, before we get too crazy, we should see if the
changes to how we do testing have a positive effect on future features.
This also comes back to the deadline situation I mentioned earlier. While
we haven't formally changed how releases are scheduled/managed, we've
informally moved to a strategy of "we'll have these problems solved before
we do the next release". I think this will also be a huge improvement to
the stability/production readiness of new features in 4.0. (ps: we should
formalise that but that's a whole 'nother wall of text)

Anyway, I have lots more to say on this and related topics but I see Josh
is already raising one of my points against experimental flags now, and
this is probably enough words for one email.

Reply via email to