On Tue, Jul 15, 2014 at 1:29 AM, John Meinel <j...@arbash-meinel.com> wrote: > It seems worthy to just run "go test github.com/juju/..." as our CI testing, > isn't it? (i.e., run all unit tests across all packages that we write on all > platforms), rather than *just* github.com/juju/juju.
Ah!. That looks easy. We could add a test like this in a day. > I don't think we run into the combinatorial problem here (if we can run all > of the github.com/juju/juju tests, than we aren't really adding much to run > the rest of the dependencies as well). > I think having a "full bootstrap, deploy, upgrade, destroy" on all platforms > is necessary as a functional test, I'm not sure that we need to cross > product it with on-all-environments. (which *would* start to run into > combinatorial problems) We are doing some combinatorial testing because we need to ensure every series+arch combination works. In Vegas Sprint, we settled on unittests and lxc tests as the best way to identify issues with arch or series. We test: precise + amd64 utopic + amd64 trusty + amd64 trusty + i386 trusty + ppc64 trusty + arm64 Cloud tests always do a deploy and an upgrade because both scenarios use simple streams, which are also under test. CI is testing juju-release-tools too since juju is isn't very useful unless it is packaged and tools are published to the CPCs. There is a large class of function tests and some integration tests with other software that we need to add this cycle. > I have the feeling, though, that "better CI" might be making some developers > a bit more lax and doing less direct testing themselves, because they expect > that CI will catch things they don't. I don't feel this. I think the problem is the complexity of Juju. Mongo changes for HA broken the backup-restore feature, I think these are different areas of expertise that needed better coordination. > I like the stop-the-line-when-CI-is-broken, as long as we have reliable ways > to stop it. Given the timescales we're working on, I'd probably be ok with > having it be a manual thing, so that when Azure decides to rev their API and > break everything that used to work, we aren't immediately crippled. Maybe we > can identify a subset of CI that is reliable (or high priority) enough that > it really is automatically stop-the-line worthy. (Trusty unit tests, PPC > unit tests, local provider, ec2 tests come to mind.) Cloud failures are not regressions in juju code. I spend a day or more a week tweaking CI to give Juju the best chance of success. I might change a test, or write a script that cleans up the resources in cloud/host. Since I am taking time to give juju more chances to pass, I delay reporting the bugs. 5 revisions might merge while I prove that juju is really broken. Since the defect can mutate with the extra commits. it isn't easy to identify the 1 or more revisions that are at fault. When we report a "ci regression" it is something we genuinely believe to work when we retest an old revision. I do provide a list of commits that can be investigated. As for automating a stop the line policy, we might be fine with a small hack to the git-merge-juju job to check for commits that claim to fix a regression, when not the case, the job fails early with the reason that we are waiting for a specific fix. Rollback is always an option. -- Curtis Hovey Canonical Cloud Development and Operations http://launchpad.net/~sinzui -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev