Warning: This is a long ponderous email. Stop now and get your cup of
coffee/tea/$beverage.

Hi folks,

I've been having some off hand conversations with folks in other open
source communities and rereading a number of good books of late (most
notably Continuous Delivery by Jez Humble) and I would like to toss
this idea out and see what folks think of it:

CloudStack is large, often complex, piece of software. There are few
people who truly grok all of CloudStack, and because of the diversity
of environments it can be deployed in it can be truly bewildering to
test fully. That said, I noticed a number of things that became last
minute, blocker bugs, and it wasn't overly esoteric configurations, it
was code paths that we just hadn't excercised in testing. One if these
would have even been easily detected and fixed we had been testing
installation alone. Additionally, as a project we are dramatically
growing. The number of people adding non-trivial amounts of code is
growing, and that makes ad hoc QA even more difficult.

So my proposal in short is that we focus on testing, beginning
immediately and erect what is effectively a continuous deployment
pipeline - such that our confidence in our codebase is such that we
could arbitrarily decide to release if we so desired. I don't think
that this is a one release type of project, indeed I think it's really
more of a culture shift than a technical project. The big shift is
that EVERYONE must be responsible for a quality release. To that end,
I'd propose the following tenets if we choose to adopt this:

* Tests become the Andon cord[1] for the entire project. When a test
fails we stop - additional commits don't happen - we find out what is
wrong and fix it. More on this in a bit.

* Tests (specifically automated tests) become part of our culture.
   * New features should come replete with both unit and integration
tests. I am tempted to suggest a certain percentage of coverage, but I
worry that it is a red herring; particularly given our dismal current
unit test coverage.
   * Blocker and critical bugs must have automated tests that get
committed as part of being qualified for closing.

* We dedicate a non-trivial portion of our energy to enhancing not
just the quantity and quality of our tests, but also on making it
highly automated, and capable of delivering fast feedback. Ideally we
would know within minutes if a commit broke unit tests, within hours
if a commit failed in integration testing.

I also know that this isn't a new idea. Lots of people have been
focused on automated testing as part of our ongoing development. The
only difference here is that I am actively asking you not to solely
depend on those folks to do the work for you, but to make testing a
part of the problem that you have to solve here. To be clear the goal
isn't to be perfect and problem free with every commit - things will
break. (If you've followed things at all in CloudStack you'll know
that I've broken builds more than once)

Pipeline I'd like to see for 4.1:

1. RAT test (fail this and we have IP problems)
2. Compile test (does it actually compile)
3. Unit tests
4. Package building
5. Automated installation (multiple platforms, does it actually
install from the packages)
6. Integration tests (aka Marvin running against virtualized or real
CS deployments)

Clearly the above isn't an end/all be all for testing, but perhaps we
can get some of this going in the 4.1 timeframe. There are also
clearly corner cases (building marvin, building api docs, building
official documentation) that need to be included in the pipeline as
well. But the principle is that we won't move on past our failure
until that failure is resolved.

Immediate Action Items:

Whether we adopt this or not, I plan on showing up on IRC once a week
to work on testing in some form or another for an hour or two. I will
also be cajoling people to join me. I might be working on
infrastructure tasks. Obviously we have people scattered across the
globe, so it's not the only time to work on testing, but you are
welcome to join me.

I am curious to hear others thoughts, comments, or flames? Is this
something we should espouse as we are close to 4.0.0 releasing and
turning our focus on 4.1?

--David

[1] http://en.wikipedia.org/wiki/Andon_(manufacturing)

Reply via email to