Hi, Arrow's current continuous integration setup utilizes multiple CI providers, tools, and scripts:
- Unit tests are running on Travis and Appveyor - Binary packaging builds are running on crossbow, an abstraction over multiple CI providers driven through a GitHub repository - For local tests and tasks, there is a docker-compose setup, or of course you can maintain your own environment This setup has run into some limitations: - It’s slow: the CI parallelism of Travis has degraded over the last couple of months. Testing a PR takes more than an hour, which is a long time for both the maintainers and the contributors, and it has a negative effect on the development throughput. - Build configurations are not portable, they are tied to specific services. You can’t just take a Travis script and run it somewhere else. - Because they’re not portable, build configurations are duplicated in several places. - The Travis, Appveyor and crossbow builds are not reproducible locally, so developing them requires the slow git push cycles. - Public CI has limited platform support, just for example ARM machines are not available. - Public CI also has limited hardware support, no GPUs are available Resolving all of the issues above is complicated, but is a must for the long term sustainability of Arrow. For some time, we’ve been working on a tool called Ursabot[1], a library on top of the CI framework Buildbot[2]. Buildbot is well maintained and widely used for complex projects, including CPython, Webkit, LLVM, MariaDB, etc. Buildbot is not another hosted CI service like Travis or Appveyor: it is an extensible framework to implement various automations like continuous integration tasks. You’ve probably noticed additional “Ursabot” builds appearing on pull requests, in addition to the Travis and Appveyor builds. We’ve been testing the framework with a fully featured CI server at ci.ursalabs.org. This service runs build configurations we can’t run on Travis, does it faster than Travis, and has the GitHub comment bot integration for ad hoc build triggering. While we’re not prepared to propose moving all CI to a self-hosted setup, our work has demonstrated the potential of using buildbot to resolve Arrow’s continuous integration challenges: - The docker-based builders are reusing the docker images, which eliminate slow dependency installation steps. Some builds on this setup, run on Ursa Labs’s infrastructure, run 20 minutes faster than the comparable Travis-CI jobs. - It’s scalable. We can deploy buildbot wherever and add more masters and workers, which we can’t do with public CI. - It’s platform and CI-provider independent. Builds can be run on arbitrary architectures, operating systems, and hardware: Python is the only requirement. Additionally builds specified in buildbot/ursabot can be run anywhere: not only on custom buildbot infrastructure but also on Travis, or even on your own machine. - It improves reproducibility and encourages consolidation of configuration. You can run the exact job locally that ran on Travis, and you can even get an interactive shell in the build so you can debug a test failure. And because you can run the same job anywhere, we wouldn’t need to have duplicated, Travis-specific or the docker-compose build configuration stored separately. - It’s extensible. More exotic features like a comment bot, benchmark database, benchmark dashboard, artifact store, integrating other systems are easily implementable within the same system. I’m proposing to donate the build configuration we’ve been iterating on in Ursabot to the Arrow codebase. Here [3] is a patch that adds the configuration. This will enable us to explore consolidating build configuration using the buildbot framework. A next step after to explore that would be to port a Travis build to Ursabot, and in the Travis configuration, execute the build by the shell command `$ ursabot project build <builder-name>`. This is the same way we would be able to execute the build locally--something we can’t currently do with the Travis builds. I am not proposing here that we stop using Travis-CI and Appveyor to run CI for apache/arrow, though that may well be a direction we choose to go in the future. Moving build configuration into something like buildbot would be a necessary first step to do that; that said, there are other immediate benefits to be had by porting build configuration into buildbot: local reproducibility, consolidation of build logic, independence from a particular CI provider, and ease of using and maintaining faster, Docker-based jobs. Self-hosting CI brings a number of other challenges, which we will concurrently continue to explore, but we believe that there are benefits to adopting buildbot build configuration regardless. Regards, Krisztian [1]: https://github.com/ursa-labs/ursabot [2]: https://buildbot.net https://docs.buildbot.net https://github.com/buildbot/buildbot [3]: https://github.com/apache/arrow/pull/5210