Hi,

Arrow's current continuous integration setup utilizes multiple CI
providers,
tools, and scripts:

 - Unit tests are running on Travis and Appveyor
 - Binary packaging builds are running on crossbow, an abstraction over
multiple
   CI providers driven through a GitHub repository
 - For local tests and tasks, there is a docker-compose setup, or of course
you
   can maintain your own environment

This setup has run into some limitations:
 - It’s slow: the CI parallelism of Travis has degraded over the last
couple of
   months. Testing a PR takes more than an hour, which is a long time for
both
   the maintainers and the contributors, and it has a negative effect on
the
   development throughput.
 - Build configurations are not portable, they are tied to specific
services.
   You can’t just take a Travis script and run it somewhere else.
 - Because they’re not portable, build configurations are duplicated in
several
   places.
 - The Travis, Appveyor and crossbow builds are not reproducible locally,
so
   developing them requires the slow git push cycles.
 - Public CI has limited platform support, just for example ARM machines
are
   not available.
 - Public CI also has limited hardware support, no GPUs are available

Resolving all of the issues above is complicated, but is a must for the
long
term sustainability of Arrow.

For some time, we’ve been working on a tool called Ursabot[1], a library on
top
of the CI framework Buildbot[2]. Buildbot is well maintained and widely
used
for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
Buildbot
is not another hosted CI service like Travis or Appveyor: it is an
extensible
framework to implement various automations like continuous integration
tasks.

You’ve probably noticed additional “Ursabot” builds appearing on pull
requests,
in addition to the Travis and Appveyor builds. We’ve been testing the
framework
with a fully featured CI server at ci.ursalabs.org. This service runs build
configurations we can’t run on Travis, does it faster than Travis, and has
the
GitHub comment bot integration for ad hoc build triggering.

While we’re not prepared to propose moving all CI to a self-hosted setup,
our
work has demonstrated the potential of using buildbot to resolve Arrow’s
continuous integration challenges:
 - The docker-based builders are reusing the docker images, which eliminate
   slow dependency installation steps. Some builds on this setup, run on
   Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
   Travis-CI jobs.
 - It’s scalable. We can deploy buildbot wherever and add more masters and
   workers, which we can’t do with public CI.
 - It’s platform and CI-provider independent. Builds can be run on
arbitrary
   architectures, operating systems, and hardware: Python is the only
   requirement. Additionally builds specified in buildbot/ursabot can be
run
   anywhere: not only on custom buildbot infrastructure but also on Travis,
or
   even on your own machine.
 - It improves reproducibility and encourages consolidation of
configuration.
   You can run the exact job locally that ran on Travis, and you can even
get
   an interactive shell in the build so you can debug a test failure. And
   because you can run the same job anywhere, we wouldn’t need to have
   duplicated, Travis-specific or the docker-compose build configuration
stored
   separately.
 - It’s extensible. More exotic features like a comment bot, benchmark
   database, benchmark dashboard, artifact store, integrating other systems
are
   easily implementable within the same system.

I’m proposing to donate the build configuration we’ve been iterating on in
Ursabot to the Arrow codebase. Here [3] is a patch that adds the
configuration.
This will enable us to explore consolidating build configuration using the
buildbot framework. A next step after to explore that would be to port a
Travis
build to Ursabot, and in the Travis configuration, execute the build by the
shell command `$ ursabot project build <builder-name>`. This is the same
way we
would be able to execute the build locally--something we can’t currently do
with the Travis builds.

I am not proposing here that we stop using Travis-CI and Appveyor to run CI
for
apache/arrow, though that may well be a direction we choose to go in the
future. Moving build configuration into something like buildbot would be a
necessary first step to do that; that said, there are other immediate
benefits
to be had by porting build configuration into buildbot: local
reproducibility,
consolidation of build logic, independence from a particular CI provider,
and
ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
brings
a number of other challenges, which we will concurrently continue to
explore,
but we believe that there are benefits to adopting buildbot build
configuration
regardless.

Regards, Krisztian

[1]: https://github.com/ursa-labs/ursabot
[2]: https://buildbot.net
     https://docs.buildbot.net
     https://github.com/buildbot/buildbot
[3]: https://github.com/apache/arrow/pull/5210

Reply via email to