Re: Devel is broken, we cannot release

Curtis Hovey-Canonical Tue, 15 Jul 2014 07:36:10 -0700

On Tue, Jul 15, 2014 at 1:29 AM, John Meinel <j...@arbash-meinel.com> wrote:
> It seems worthy to just run "go test github.com/juju/..." as our CI testing,
> isn't it? (i.e., run all unit tests across all packages that we write on all
> platforms), rather than *just* github.com/juju/juju.


Ah!. That looks easy. We could add a test like this in a day.

> I don't think we run into the combinatorial problem here (if we can run all
> of the github.com/juju/juju tests, than we aren't really adding much to run
> the rest of the dependencies as well).



> I think having a "full bootstrap, deploy, upgrade, destroy" on all platforms
> is necessary as a functional test, I'm not sure that we need to cross
> product it with on-all-environments. (which *would* start to run into
> combinatorial problems)

We are doing some combinatorial testing because we need to ensure
every series+arch combination works. In Vegas Sprint, we settled on
unittests and lxc tests as the best way to identify issues with arch
or series. We test:

  precise + amd64
  utopic + amd64
  trusty + amd64
  trusty + i386
  trusty + ppc64
  trusty + arm64

Cloud tests always do a deploy and an upgrade because both scenarios
use simple streams, which are also under test. CI is testing
juju-release-tools too since juju is isn't very useful unless it is
packaged and tools are published to the CPCs.

There is a large class of function tests and some integration tests
with other software that we need to add this cycle.

> I have the feeling, though, that "better CI" might be making some developers
> a bit more lax and doing less direct testing themselves, because they expect
> that CI will catch things they don't.

I don't feel this. I think the problem is the complexity of Juju.
Mongo changes for HA broken the backup-restore feature, I think these
are different areas of expertise that needed better coordination.

> I like the stop-the-line-when-CI-is-broken, as long as we have reliable ways
> to stop it. Given the timescales we're working on, I'd probably be ok with
> having it be a manual thing, so that when Azure decides to rev their API and
> break everything that used to work, we aren't immediately crippled. Maybe we
> can identify a subset of CI that is reliable (or high priority) enough that
> it really is automatically stop-the-line worthy. (Trusty unit tests, PPC
> unit tests, local provider, ec2 tests come to mind.)

Cloud failures are not regressions in juju code. I spend a day or more
a week tweaking CI to give Juju the best chance of success. I might
change a test, or write a script that cleans up the resources in
cloud/host.

Since I am taking time to give juju more chances to pass, I delay
reporting the bugs. 5 revisions might merge while I prove that juju is
really broken. Since the defect can mutate with the extra commits. it
isn't easy to identify the 1 or more revisions that are at fault.

When we report a "ci regression" it is something we genuinely believe
to work when we retest an old revision. I do provide a list of commits
that can be investigated.

As for automating a stop the line policy, we might be fine with a
small hack to the git-merge-juju job to check for commits that claim
to fix a regression, when not the case, the job fails early with the
reason that we are waiting for a specific fix. Rollback is always an
option.



-- 
Curtis Hovey
Canonical Cloud Development and Operations
http://launchpad.net/~sinzui

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Devel is broken, we cannot release

Reply via email to