Hi all, I’d like to propose the following that we switch our continuous integration (CI) system from Jenkins [1] to Concourse [2]. I suggest this because we continue to experience a significant number of environmental-related test failures.
These issues include CPU interference from other Jenkins jobs on the same host, running out of disk space, port conflicts, and other gremlins. The net effect is that we are only getting 1-2 successful builds per month. Certainly not all test failures can be traced back to environmental issues. However, internal testing on isolated VM’s shows a combined success rate of about 3X higher compared to ASF Jenkins for the same tests. This is still definitely NotAwesome, but removing environmental factors will let us focus on stabilizing flaky tests. Concourse is an Apache-licensed open source CI system based on pipelines. The pipelines are defined in a YML file containing job definitions—inputs, outputs, resources, and tasks. A task is simply a bash script that returns 0/1 for success/failure. A web UI displays build status. Importantly, each job runs inside an isolated container. The containers are load-balanced across a pool of workers. For an example of a build pipeline, see [3] for the pipeline used to build concourse itself. A Concourse environment is deployed and managed in cloud environments through bosh [4]. Pivotal has agreed to donate AWS and/or GCP compute and storage resources as well as manage the infrastructure. These project resources would be available for use by all committers and community members regardless of corporate affiliations. Note that AFAIK there is no explicit requirement to host CI on ASF infrastructure—unlike for critical project resources such as source code, mailing lists, and issue tracking. The source for the pipeline and job scripts would reside within the geode-* repos. Geode committers would be able to modify those, same as with our .travis.yml scripts. All test results and build artifacts would be publicly viewable just like with our Jenkins build output today. Requests for admin assistance would go through the dev@geode mailing list. Thoughts? As a first step we could run both CI systems side-by-side and see how the Concourse approach works for our project. Thanks, Anthony [1] https://builds.apache.org/job/Geode-nightly/ [2] https://concourse.ci [3] https://ci.concourse.ci [4] https://bosh.io