Warning: This is a long ponderous email. Stop now and get your cup of coffee/tea/$beverage.
Hi folks, I've been having some off hand conversations with folks in other open source communities and rereading a number of good books of late (most notably Continuous Delivery by Jez Humble) and I would like to toss this idea out and see what folks think of it: CloudStack is large, often complex, piece of software. There are few people who truly grok all of CloudStack, and because of the diversity of environments it can be deployed in it can be truly bewildering to test fully. That said, I noticed a number of things that became last minute, blocker bugs, and it wasn't overly esoteric configurations, it was code paths that we just hadn't excercised in testing. One if these would have even been easily detected and fixed we had been testing installation alone. Additionally, as a project we are dramatically growing. The number of people adding non-trivial amounts of code is growing, and that makes ad hoc QA even more difficult. So my proposal in short is that we focus on testing, beginning immediately and erect what is effectively a continuous deployment pipeline - such that our confidence in our codebase is such that we could arbitrarily decide to release if we so desired. I don't think that this is a one release type of project, indeed I think it's really more of a culture shift than a technical project. The big shift is that EVERYONE must be responsible for a quality release. To that end, I'd propose the following tenets if we choose to adopt this: * Tests become the Andon cord[1] for the entire project. When a test fails we stop - additional commits don't happen - we find out what is wrong and fix it. More on this in a bit. * Tests (specifically automated tests) become part of our culture. * New features should come replete with both unit and integration tests. I am tempted to suggest a certain percentage of coverage, but I worry that it is a red herring; particularly given our dismal current unit test coverage. * Blocker and critical bugs must have automated tests that get committed as part of being qualified for closing. * We dedicate a non-trivial portion of our energy to enhancing not just the quantity and quality of our tests, but also on making it highly automated, and capable of delivering fast feedback. Ideally we would know within minutes if a commit broke unit tests, within hours if a commit failed in integration testing. I also know that this isn't a new idea. Lots of people have been focused on automated testing as part of our ongoing development. The only difference here is that I am actively asking you not to solely depend on those folks to do the work for you, but to make testing a part of the problem that you have to solve here. To be clear the goal isn't to be perfect and problem free with every commit - things will break. (If you've followed things at all in CloudStack you'll know that I've broken builds more than once) Pipeline I'd like to see for 4.1: 1. RAT test (fail this and we have IP problems) 2. Compile test (does it actually compile) 3. Unit tests 4. Package building 5. Automated installation (multiple platforms, does it actually install from the packages) 6. Integration tests (aka Marvin running against virtualized or real CS deployments) Clearly the above isn't an end/all be all for testing, but perhaps we can get some of this going in the 4.1 timeframe. There are also clearly corner cases (building marvin, building api docs, building official documentation) that need to be included in the pipeline as well. But the principle is that we won't move on past our failure until that failure is resolved. Immediate Action Items: Whether we adopt this or not, I plan on showing up on IRC once a week to work on testing in some form or another for an hour or two. I will also be cajoling people to join me. I might be working on infrastructure tasks. Obviously we have people scattered across the globe, so it's not the only time to work on testing, but you are welcome to join me. I am curious to hear others thoughts, comments, or flames? Is this something we should espouse as we are close to 4.0.0 releasing and turning our focus on 4.1? --David [1] http://en.wikipedia.org/wiki/Andon_(manufacturing)