Hi all,

I’d like to propose the following that we switch our continuous
integration (CI) system from Jenkins [1] to Concourse [2].  I suggest
this because we continue to experience a significant number of
environmental-related test failures.

These issues include CPU interference from other Jenkins jobs on the
same host, running out of disk space, port conflicts, and other
gremlins.  The net effect is that we are only getting 1-2 successful
builds per month.  Certainly not all test failures can be traced back
to environmental issues.  However, internal testing on isolated VM’s
shows a combined success rate of about 3X higher compared to ASF
Jenkins for the same tests.  This is still definitely NotAwesome, but
removing environmental factors will let us focus on stabilizing flaky
tests.

Concourse is an Apache-licensed open source CI system based on
pipelines.  The pipelines are defined in a YML file containing job
definitions—inputs, outputs, resources, and tasks.  A task is simply a
bash script that returns 0/1 for success/failure.  A web UI displays
build status.  Importantly, each job runs inside an isolated
container.  The containers are load-balanced across a pool of workers.
For an example of a build pipeline, see [3] for the pipeline used to
build concourse itself.

A Concourse environment is deployed and managed in cloud environments
through bosh [4].  Pivotal has agreed to donate AWS and/or GCP compute
and storage resources as well as manage the infrastructure.  These
project resources would be available for use by all committers and
community members regardless of corporate affiliations.  Note that
AFAIK there is no explicit requirement to host CI on ASF
infrastructure—unlike for critical project resources such as source
code, mailing lists, and issue tracking.

The source for the pipeline and job scripts would reside within the
geode-* repos.  Geode committers would be able to modify those, same
as with our .travis.yml scripts.  All test results and build artifacts
would be publicly viewable just like with our Jenkins build output
today.  Requests for admin assistance would go through the dev@geode
mailing list.

Thoughts?  As a first step we could run both CI systems side-by-side
and see how the Concourse approach works for our project.

Thanks,
Anthony


[1] https://builds.apache.org/job/Geode-nightly/
[2] https://concourse.ci
[3] https://ci.concourse.ci
[4] https://bosh.io

Reply via email to