Hello Krisztián, I like this proposal. CI coverage and response time is a crucial thing for the health of the project. In general I like the consolidation and local reproducibility of tge builds. Some questions I wanted to ask to make sure I understand your proposal correctly (hopefully they all can be answered with a simple yes):
* Windows builds will stay in Appveyor for now? * MacOS builds will stay in Travis? * All other builds will be removed from Travis? * Machines are currently run and funded by UrsaLabs but others could also sponsor an instance that could be added to the setup? * The build configuration is automatically updated on a merge to master? And then a not so simple one: What will happen to our current docker-compose setup? From the PR it seems like we do similar things with ursabot but not using the central docker-compose.yml? Cheers Uwe > Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs <[email protected]>: > > Hi, > > Arrow's current continuous integration setup utilizes multiple CI > providers, > tools, and scripts: > > - Unit tests are running on Travis and Appveyor > - Binary packaging builds are running on crossbow, an abstraction over > multiple > CI providers driven through a GitHub repository > - For local tests and tasks, there is a docker-compose setup, or of course > you > can maintain your own environment > > This setup has run into some limitations: > - It’s slow: the CI parallelism of Travis has degraded over the last > couple of > months. Testing a PR takes more than an hour, which is a long time for > both > the maintainers and the contributors, and it has a negative effect on > the > development throughput. > - Build configurations are not portable, they are tied to specific > services. > You can’t just take a Travis script and run it somewhere else. > - Because they’re not portable, build configurations are duplicated in > several > places. > - The Travis, Appveyor and crossbow builds are not reproducible locally, > so > developing them requires the slow git push cycles. > - Public CI has limited platform support, just for example ARM machines > are > not available. > - Public CI also has limited hardware support, no GPUs are available > > Resolving all of the issues above is complicated, but is a must for the > long > term sustainability of Arrow. > > For some time, we’ve been working on a tool called Ursabot[1], a library on > top > of the CI framework Buildbot[2]. Buildbot is well maintained and widely > used > for complex projects, including CPython, Webkit, LLVM, MariaDB, etc. > Buildbot > is not another hosted CI service like Travis or Appveyor: it is an > extensible > framework to implement various automations like continuous integration > tasks. > > You’ve probably noticed additional “Ursabot” builds appearing on pull > requests, > in addition to the Travis and Appveyor builds. We’ve been testing the > framework > with a fully featured CI server at ci.ursalabs.org. This service runs build > configurations we can’t run on Travis, does it faster than Travis, and has > the > GitHub comment bot integration for ad hoc build triggering. > > While we’re not prepared to propose moving all CI to a self-hosted setup, > our > work has demonstrated the potential of using buildbot to resolve Arrow’s > continuous integration challenges: > - The docker-based builders are reusing the docker images, which eliminate > slow dependency installation steps. Some builds on this setup, run on > Ursa Labs’s infrastructure, run 20 minutes faster than the comparable > Travis-CI jobs. > - It’s scalable. We can deploy buildbot wherever and add more masters and > workers, which we can’t do with public CI. > - It’s platform and CI-provider independent. Builds can be run on > arbitrary > architectures, operating systems, and hardware: Python is the only > requirement. Additionally builds specified in buildbot/ursabot can be > run > anywhere: not only on custom buildbot infrastructure but also on Travis, > or > even on your own machine. > - It improves reproducibility and encourages consolidation of > configuration. > You can run the exact job locally that ran on Travis, and you can even > get > an interactive shell in the build so you can debug a test failure. And > because you can run the same job anywhere, we wouldn’t need to have > duplicated, Travis-specific or the docker-compose build configuration > stored > separately. > - It’s extensible. More exotic features like a comment bot, benchmark > database, benchmark dashboard, artifact store, integrating other systems > are > easily implementable within the same system. > > I’m proposing to donate the build configuration we’ve been iterating on in > Ursabot to the Arrow codebase. Here [3] is a patch that adds the > configuration. > This will enable us to explore consolidating build configuration using the > buildbot framework. A next step after to explore that would be to port a > Travis > build to Ursabot, and in the Travis configuration, execute the build by the > shell command `$ ursabot project build <builder-name>`. This is the same > way we > would be able to execute the build locally--something we can’t currently do > with the Travis builds. > > I am not proposing here that we stop using Travis-CI and Appveyor to run CI > for > apache/arrow, though that may well be a direction we choose to go in the > future. Moving build configuration into something like buildbot would be a > necessary first step to do that; that said, there are other immediate > benefits > to be had by porting build configuration into buildbot: local > reproducibility, > consolidation of build logic, independence from a particular CI provider, > and > ease of using and maintaining faster, Docker-based jobs. Self-hosting CI > brings > a number of other challenges, which we will concurrently continue to > explore, > but we believe that there are benefits to adopting buildbot build > configuration > regardless. > > Regards, Krisztian > > [1]: https://github.com/ursa-labs/ursabot > [2]: https://buildbot.net > https://docs.buildbot.net > https://github.com/buildbot/buildbot > [3]: https://github.com/apache/arrow/pull/5210
