Introduction

For Apache Impala's (incubating) "ASF milestone 1", we need to make
progress on the mega-task of having public-facing build and test
infrastructure. It's not a requirement that we finish this for ASF
milestone 1. For now, I propose we focus on researching public options
available and presenting findings and conclusions. The full task is
tracked at https://issues.cloudera.org/browse/IMPALA-3228

I'm looking for volunteers to help with this assessment. If you don't
want to volunteer, can't volunteer, or aren't interested in the
decisions ultimately made, you don't need to read the rest of this
document.


Document Outline

This document is necessarily long. There is a lot to consider when
choosing a public build/test provider, and it's better to clearly list
out important points as opposed to just assuming everyone is on the same
page.

First, I prioritize the sorts of jobs we may choose eventually to have
available to all committers in a public build/test infrastructure.

Second, I list features existing Apache Impala (incubating) build/test
infrastructure jobs have. When I talk about "existing" infrastructure, I
mean that inside Cloudera, Inc., since to my knowledge that is all that
exists in any sort of formal nature for Apache Impala (incubating).

Third, I list additional requirements and features that have not been
implemented but must be considered.

Forth, I list potential public build and test service provider
candidates and things to assess given the information provided in the
earlier sections.

Fifth, I have a task list, for which volunteers may choose to sign up.


I. Job Priorities

These are listed in order to give consideration for Existing Build
Environment Characteristics below.

First Priority (ASF Milestone 2)

1. Pre-commit verification job, to gate patch acceptance based on
build's pass/fail status. Among the Apache Impala (incubating) dev
community, this is colloquially known as "Gerrit verify merge" or
"Gerrit verify only" (GVM, GVO).

Second Priority (future consideration)

1. Regular execution of exhaustive tests
2. Data load snapshot publication; will speed up run of builds, but not
absolutely needed

Third Priority (future consideration)

Listed in no particular order of priority:

- Apache Impala (incubating) compiled with ASAN + tests
- compiled for release + tests
- Apache Impala (incubating) configured with legacy aggregations and
joins + tests
- configured to run on a local filesystem + tests
- compiled for code coverage + tests
- Private builds (i.e., for testing changes but not merging or
cherry-picking after passing)

Out of Scope

- Apache Impala (incubating) on S3 or Isilon, alternative filesystems
and appliances within Cloudera internal network
- Anything that interacts with Cloudera, Inc. CDH clusters, like stress
or performance
- Anything not otherwise included as part of any priority


II. Existing Build Environment Characteristics

Here, I try to list the characteristics of the internal Cloudera /
Jenkins build environment. While it's likely that many providers'
solutions also support most if not all these features, it'd be good to
get these written down. Assessors must consider these. These are in no
particular order.

Soft / Administrative

- Anyone employed at Cloudera working on Apache Impala (incubating) can
view or alter the jobs (promotes the idea that everyone can enhance the
jobs and theoretically helps discourage de facto sysadmins or "experts")

- Individuals at Cloudera are not wholly on our own to maintain internal
Jenkins: while we may change our jobs, Jenkins proper is administrated
by a separate group. If the entire Jenkins infrastructure goes down,
they are on call to fix it.


Technical

- The ability to define job parameters (for job reuse)

- The ability run builds in parallel (for efficiency/productivity)

- The ability to queue up build requests if there are not enough
available resources to run the build immediately

- The ability to capture and display the contents of stderr / stdout
(for quick failure triage/debugging)

- The ability to collect artifacts (for more detailed debugging /
forensic analysis)

- Retention some builds and artifacts up to a point (useful for binary
search for bug hunting; "how did this work before?" investigations;
etc.)

- Build triggers including time-based or event-based (needed if we ever
want more than just a GVM/GVO job)

- Underlying GNU/Linux distribution with Bash and Python (to be able to
bootstrap the so-called "toolchain", download requirements, and
bootstrap virtual environment)

- Underlying GNU/Linux distribution is supported by the Apache Impala
(incubating) toolchain (to be able to compile the project)

- Provides passwordless sudo with no restrictions (Cloudera Jenkins
provides this; whether this is a good thing is debatable, but it can
come handy if it's the only way to install additional packages, or if a
job needs to modify a ulimit.)

- Configurable notification of pass/failure/etc. (helps with manual
build triage)

- Obvious pass/failure status on some splash screen / dashboard (nice to
see "state of the world" or "history of a build")

- Configurable automatic abort if the job appears stuck (hard to spot
these, so it's nice to have some automatic process in place here)

- The ability to build the job in phases or "steps" (this allows some
post-build proper step to run unconditionally, for example, even if
some previous step fails)

- The ability to manage disk space (clean up after itself)

- SSH access granted to any committer (useful when forensic evidence is
lacking or to look at a hung build)

- Can spin up slaves that satisfy Apache Impala's (incubating) disk and
memory requirements, and have CPU such that full builds+tests take 4-12
hours. Note the time-to-execute range depends on both the compiler
options chosen and also which tests are run.

- Can interact with Gerrit (https://gerrit.cloudera.org)


III. Additional Build Environment Requirements and Considerations

In no particular order, here we list additional requirements that we're
not taking advantage of, but should. We also list requirements that take
into account the public nature of Apache Impala (incubating). Assessors
must consider these.

Soft / Administrative

- All committers should have equal access to the build environment infra

- Cloudera cannot expose internal services to the public

- Cloudera pays for Kudu's GCE public infra, but it's totally separate
from Cloudera

- Cloudstack is another ASF project using external build/test infra

- Not all of a project's build/test infra must be public. This is the
case with Kudu. Note that the Kudu pre-commit job is crafted in such a
way that it's a good gating for finding bugs.

- Potential hardware donations from Cloudera to ASF should be considered
for all of ASF and not exclusively for Apache Impala (incubating). ASF
frowns on donations for a specific project, and we should expect any
donations to go into a generic resource pool for use for any ASF
project.

- Separate external infra for Apache Impala (incubating) is borderline
with ASF, but probably fine. The key is ensuring that if Cloudera (or
any "main backer") were ever to pull funding, then the project shouldn't
be made homeless. This can be achieved via transparency on how the infra
is maintained so that someone else can come in and do it. In our case, I
think this can be satisfied by a combination of keeping our jobs in SCM
(see Technical just below) and providing documentation for any
surrounding administrivia (e.g, "Here's how to set up your SSH key to
update the jobs on the provider").

Technical

- Modular way to build and maintain jobs via SCM, e.g., Jenkins job DSL
or Jenkins Job Builder (see Notes below). Programmatically building our
jobs and maintaining them that way means we don't have the problems of
clone-edit proliferation, and it's simple to update a lot of jobs at
once.

- Jobs can be staged as "test jobs" and tested before being incorporated
into mainline.

- Jobs can easily be created for multiple branches, either feature
branches or maintenance release branches.

- Infra is upgrade-able (and not stuck on a 6 year old version)

- System requirements: It's possible some of the public offerings are
non-starters--or at least their free offerings are--because their
systems' specs are inferior to Apache Impala's (incubating) system
requirements. To that end, we need to get a reasonable ballpark of how
much disk and memory we tend to use in our build and tests, and if we
have less CPU than the EC2 instances available to those of us within
Cloudera, what the cost in additional build and test time is.

Note: Apache Impala (incubating) hardware requirements for CDH clusters
are aggressive compared to the so-called "minicluster" (see Notes
below).


IV. Public Build / Test Infra Offerings

Things to Assess

- What are the system specs of their free offering?

- What are the restrictions of the free offering (job, build cap;
writable repo; etc.)?

- What is the cost of a paid offering providing it will have sufficient
CPU, disk, memory specs? Please clarify unit (e.g., dollars per hour per
build node).

- Do the public or paid offerings offer feature parity with features
in section II?

- Do the public or paid offerings make it possible to satisfy the
requirements and considerations in section III?

Choices

This is not an exhaustive list. If people know or endorse others, speak
up. If you suggest another, your consent to be chosen to be its assessor
for this project is implied.

- ASF Jenkins: https://wiki.apache.org/general/Jenkins

- Travis: https://travis-ci.org/

- Cloudbees: https://www.cloudbees.com/

- something similar to Kudu's GCE Setup (this requires extra research)
http://104.196.14.100/

- Others?
https://en.wikipedia.org/wiki/Comparison_of_continuous_integration_software


V. Immediate Task List

For any task that says "the more the better", please reply with your
points.

For any task that says "anyone", please reply to say you're taking it
on.

- Quick audit of section II above to ensure I didn't miss anything
needed: the more the better

- Quick audit of section III above to ensure I didn't miss anything
needed: the more the better

- Share current/past experiences with any public build/infra service
providers listed or not listed in section IV
needed: the more the better

- Determine ballpark Apache Impala (incubating) build/test system
requirements (this somewhat blocks the below and should be chosen sooner
rather than later)
needed: anyone

- Assess ASF Jenkins
needed: anyone

- Assess Travis
needed: anyone

- Assess Cloudbees
needed: anyone

- Research Kudu GCE setup (contact me about things to ask Kudu)
needed: anyone

Thanks for reading.


References and Notes

https://issues.cloudera.org/browse/IMPALA-3228

http://jenkins.buildacloud.org/

https://wiki.jenkins-ci.org/display/JENKINS/Job+DSL+Plugin

http://docs.openstack.org/infra/jenkins-job-builder/

http://www.cloudera.com/documentation/enterprise/latest/topics/impala_prereqs1.html#prereqs_hardware

Reply via email to