Re: 3.0 and the Cassandra release process

Terrance Shepherd Wed, 18 Mar 2015 14:59:27 -0700

I like the idea but I agree that every month is a bit aggressive. I have no
say but:


I would say 4 releases a year instead of 12. with 2 months of new features
and 1 month of bug squashing per a release. With the 4th quarter just bugs.

I would also proposed 2 year LTS releases for the releases after the 4th
quarter. So everyone could get a new feature release every quarter and the
stability of super major versions for 2 years.

On Wed, Mar 18, 2015 at 2:34 PM, Dave Brosius <[email protected]>
wrote:

> It would seem the practical implications of this is that there would be
> significantly more development on branches, with potentially more
> significant delays on merging these branches. This would imply to me that
> more Jenkins servers would need to be set up to handle auto-testing of more
> branches, as if feature work spends more time on external branches, it is
> then likely to be be less tested (even if by accident) as less developers
> would be working on that branch. Only when a feature was blessed to make it
> to the release-tracked branch, would it become exposed to the majority of
> developers/testers, etc doing normal running/playing/testing.
>
> This isn't to knock the idea in anyway, just wanted to mention what i
> think the outcome would be.
>
> dave
>
>
>
>  >
>>> > On Tue, Mar 17, 2015 at 5:06 PM, Jonathan Ellis <[email protected]>
>>> wrote:
>>> > > Cassandra 2.1 was released in September, which means that if we were
>>> on
>>> > > track with our stated goal of six month releases, 3.0 would be done
>>> about
>>> > > now.  Instead, we haven't even delivered a beta.  The immediate cause
>>> > this
>>> > > time is blocking for 8099
>>> > > <https://issues.apache.org/jira/browse/CASSANDRA-8099>, but the
>>> reality
>>> > is
>>> > > that nobody should really be surprised.  Something always comes up --
>>> > we've
>>> > > averaged about nine months since 1.0, with 2.1 taking an entire year.
>>> > >
>>> > > We could make theory align with reality by acknowledging, "if nine
>>> months
>>> > > is our 'natural' release schedule, then so be it."  But I think we
>>> can
>>> do
>>> > > better.
>>> > >
>>> > > Broadly speaking, we have two constituencies with Cassandra releases:
>>> > >
>>> > > First, we have the users who are building or porting an application
>>> on
>>> > > Cassandra.  These users want the newest features to make their job
>>> > easier.
>>> > > If 2.1.0 has a few bugs, it's not the end of the world.  They have
>>> time
>>> > to
>>> > > wait for 2.1.x to stabilize while they write their code.  They would
>>> like
>>> > > to see us deliver on our six month schedule or even faster.
>>> > >
>>> > > Second, we have the users who have an application in production.
>>> These
>>> > > users, or their bosses, want Cassandra to be as stable as possible.
>>> > > Assuming they deploy on a stable release like 2.0.12, they don't want
>>> to
>>> > > touch it.  They would like to see us release *less* often.  (Because
>>> that
>>> > > means they have to do less upgrades while remaining in our backwards
>>> > > compatibility window.)
>>> > >
>>> > > With our current "big release every X months" model, these users'
>>> needs
>>> > are
>>> > > in tension.
>>> > >
>>> > > We discussed this six months ago, and ended up with this:
>>> > >
>>> > > What if we tried a [four month] release cycle, BUT we would guarantee
>>> > that
>>> > >> you could do a rolling upgrade until we bump the supermajor version?
>>> So
>>> > 2.0
>>> > >> could upgrade to 3.0 without having to go through 2.1.  (But to go
>>> to
>>> > 3.1
>>> > >> or 4.0 you would have to go through 3.0.)
>>> > >>
>>> > >
>>> > > Crucially, I added
>>> > >
>>> > > Whether this is reasonable depends on how fast we can stabilize
>>> releases.
>>> > >> 2.1.0 will be a good test of this.
>>> > >>
>>> > >
>>> > > Unfortunately, even after DataStax hired half a dozen full-time test
>>> > > engineers, 2.1.0 continued the proud tradition of being unready for
>>> > > production use, with "wait for .5 before upgrading" once again
>>> looking
>>> > like
>>> > > a good guideline.
>>> > >
>>> > > I’m starting to think that the entire model of “write a bunch of new
>>> > > features all at once and then try to stabilize it for release” is
>>> broken.
>>> > > We’ve been trying that for years and empirically speaking the
>>> evidence
>>> is
>>> > > that it just doesn’t work, either from a stability standpoint or even
>>> > just
>>> > > shipping on time.
>>> > >
>>> > > A big reason that it takes us so long to stabilize new releases now
>>> is
>>> > > that, because our major release cycle is so long, it’s super tempting
>>> to
>>> > > slip in “just one” new feature into bugfix releases, and I’m as
>>> guilty
>>> of
>>> > > that as anyone.
>>> > >
>>> > > For similar reasons, it’s difficult to do a meaningful freeze with
>>> big
>>> > > feature releases.  A look at 3.0 shows why: we have 8099 coming, but
>>> we
>>> > > also have significant work done (but not finished) on 6230, 7970,
>>> 6696,
>>> > and
>>> > > 6477, all of which are meaningful improvements that address
>>> demonstrated
>>> > > user pain.  So if we keep doing what we’ve been doing, our choices
>>> are
>>> to
>>> > > either delay 3.0 further while we finish and stabilize these, or we
>>> wait
>>> > > nine months to a year for the next release.  Either way, one of our
>>> > > constituencies gets disappointed.
>>> > >
>>> > > So, I’d like to try something different.  I think we were on the
>>> right
>>> > > track with shorter releases with more compatibility.  But I’d like to
>>> > throw
>>> > > in a twist.  Intel cuts down on risk with a “tick-tock” schedule for
>>> new
>>> > > architectures and process shrinks instead of trying to do both at
>>> once.
>>> > We
>>> > > can do something similar here:
>>> > >
>>> > > One month releases.  Period.  If it’s not done, it can wait.
>>> > > *Every other release only accepts bug fixes.*
>>> > >
>>> > > By itself, one-month releases are going to dramatically reduce the
>>> > > complexity of testing and debugging new releases -- and bugs that do
>>> slip
>>> > > past us will only affect a smaller percentage of users, avoiding the
>>> “big
>>> > > release has a bunch of bugs no one has seen before and pretty much
>>> > everyone
>>> > > is hit by something” scenario.  But by adding in the second rule, I
>>> think
>>> > > we have a real chance to make a quantum leap here: stable,
>>> > production-ready
>>> > > releases every two months.
>>> > >
>>> > > So here is my proposal for 3.0:
>>> > >
>>> > > We’re just about ready to start serious review of 8099.  When that’s
>>> > done,
>>> > > we branch 3.0 and cut a beta and then release candidates.  Whatever
>>> isn’t
>>> > > done by then, has to wait; unlike prior betas, we will only accept
>>> bug
>>> > > fixes into 3.0 after branching.
>>> > >
>>> > > One month after 3.0, we will ship 3.1 (with new features).  At the
>>> same
>>> > > time, we will branch 3.2.  New features in trunk will go into 3.3.
>>> The
>>> > 3.2
>>> > > branch will only get bug fixes.  We will maintain backwards
>>> compatibility
>>> > > for all of 3.x; eventually (no less than a year) we will pick a
>>> release
>>> > to
>>> > > be 4.0, and drop deprecated features and old backwards
>>> compatibilities.
>>> > > Otherwise there will be nothing special about the 4.0 designation.
>>> (Note
>>> > > that with an “odd releases have new features, even releases only have
>>> bug
>>> > > fixes” policy, 4.0 will actually be *more* stable than 3.11.)
>>> > >
>>> > > Larger features can continue to be developed in separate branches,
>>> the
>>> > way
>>> > > 8099 is being worked on today, and committed to trunk when ready.  So
>>> > this
>>> > > is not saying that we are limited only to features we can build in a
>>> > single
>>> > > month.
>>> > >
>>> > > Some things will have to change with our dev process, for the better.
>>> In
>>> > > particular, with one month to commit new features, we don’t have room
>>> for
>>> > > committing sloppy work and stabilizing it later.  Trunk has to be
>>> stable
>>> > at
>>> > > all times.  I asked Ariel Weisberg to put together his thoughts
>>> > separately
>>> > > on what worked for his team at VoltDB, and how we can apply that to
>>> > > Cassandra -- see his email from Friday <http://bit.ly/1MHaOKX>.
>>> (TLDR:
>>> > > Redefine “done” to include automated tests.  Infrastructure to run
>>> tests
>>> > > against github branches before merging to trunk.  A new test harness
>>> for
>>> > > long-running regression tests.)
>>> > >
>>> > > I’m optimistic that as we improve our process this way, our even
>>> releases
>>> > > will become increasingly stable.  If so, we can skip sub-minor
>>> releases
>>> > > (3.2.x) entirely, and focus on keeping the release train moving.  In
>>> the
>>> > > meantime, we will continue delivering 2.1.x stability releases.
>>> > >
>>> > > This won’t be an entirely smooth transition.  In particular, you will
>>> > have
>>> > > noticed that 3.1 will get more than a month’s worth of new features
>>> while
>>> > > we stabilize 3.0 as the last of the old way of doing things, so some
>>> > > patience is in order as we try this out.  By 3.4 and 3.6 later this
>>> year
>>> > we
>>> > > should have a good idea if this is working, and we can make
>>> adjustments
>>> > as
>>> > > warranted.
>>> > >
>>> > > --
>>> > > Jonathan Ellis
>>> > > Project Chair, Apache Cassandra
>>> > > co-founder, http://www.datastax.com
>>> > > @spyced
>>>
>>
>

Re: 3.0 and the Cassandra release process

Reply via email to