Re: [PROPOSAL] Consolidate Arrow's CI configuration

Krisztián Szűcs Sun, 08 Sep 2019 00:47:41 -0700

On Sat, Sep 7, 2019 at 9:54 AM Sutou Kouhei <[email protected]> wrote:


> Hi,
>
> I may have Ursabot experience because I've tried to create a
> Ursabot configuration for GLib:
>
>   https://github.com/ursa-labs/ursabot/pull/172

Which is great, thanks for doing that!

>
>
> I like the proposal to consolidate CI configuration into
> Arrow repository. But I like the current docker-compose
> based approach to write how to run each CI job.
>
> I know that Krisztián pointed out docker-compose based
> approach has a problem in Docker image dependency
> resolution.
>
>
> https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E
>
> > The "docker-compose setup"
> > --------------------------
> > ...
> > However docker-compose is not suitable for building and running
> > hierarchical
> > images. This is why we have added Makefile [1] to execute a "build" with
> a
> > single make command instead of manually executing multiple commands
> > involving
> > multiple images (which is error prone). It can also leave a lot of
> garbage
> > after both containers and images.
> > ...
> > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
>
> But I felt that I want to use well used approach than our
> specific Python based DSL while I created c_glib
> configuration for Ursabot. If we can use well used approach,
> we can use the approach in other projects. It means that we
> can decrease learning cost.
>
> I also felt that creating a class for each command for
> readable DSL is over-engineering. I know that I could use
> raw ShellCommand class but it'll break consistency.

I've added those command aliases really just for convenience, and to
not forget to customise them a bit. Each command can be customized
to parse e.g. number of failing/warning/succeded test cases from
a step and create a summary - which can greatly improve the readability
of the build log. Set different behaviours for different states, they can
use
locks across the whole CI, and other dynamic things can be done, like
triggering another schedulers.
These commands are not shell commands, we can represent more with
the buildbot build steps than with shell scripts. The conversion would also
work from buildbot BuildSteps to bash scripts by mocking out the non
ShellCommand steps. Thus buildbot DSL can be executed as a shell script
however with a shell script we cannot represent certain logics, which
would be useful for the hosted build master.

>
> For example:
>
>   Creating Meson class to run meson command:
>
> https://github.com/ursa-labs/ursabot/pull/172/files#diff-663dab3e9eab42dfac85d2fdb69c7e95R313-R315
>
> How about just creating a wrapper script for docker-compose
> instead of creating DSL?
>
I've also tried to figure out a way to reuse the bits from the
docker-compose
setup, but after some time I've realised that it'd be easier to generate
bash scripts and docker-compose.yml from the buildbot DSL because it
represents more abstractions.
Additionally docker-compose was not convenient for first use either, it took
a couple of iterations to reach the current state which balances between
the limitations of docker-compose and Arrow's requirements.
While docker-compose and the docker builders would work with linux and
windows builds, other platforms would fall short.

>
> For example, we will be able to use labels [labels] to put
> metadata to each image:
>
> ----
> diff --git a/arrow-docker-compose b/arrow-docker-compose
> new file mode 100755
> index 000000000..fcb7f5e37
> --- /dev/null
> +++ b/arrow-docker-compose
> @@ -0,0 +1,13 @@
> +#!/usr/bin/env ruby
> +
> +require "yaml"
> +
> +if ARGV == ["build", "c_glib"]
> +  config = YAML.load(File.read("docker-compose.yml"))
> +  from =
> config["services"]["c_glib"]["build"]["labels"]["org.apache.arrow.from"]
> +  if from
> +    system("docker-compose", "build", from)
> +  end
> +end
> +system("docker-compose", *ARGV)
> diff --git a/docker-compose.yml b/docker-compose.yml
> index 4f3f4128a..acd649a19 100644
> --- a/docker-compose.yml
> +++ b/docker-compose.yml
> @@ -103,6 +103,8 @@ services:
>      build:
>        context: .
>        dockerfile: c_glib/Dockerfile
> +      labels:
> +        "org.apache.arrow.from": cpp
>      volumes: *ubuntu-volumes
>
>    cpp:
> ----
>
> "./arrow-docker-compose build c_glib" runs
> "docker-compose build cpp" then
> "docker-compose build c_glib".
>
> [labels] https://docs.docker.com/compose/compose-file/#labels
>
>
> If we just have convenient docker-compose wrapper, can we
> use raw Buildbot that just runs the docker-compose wrapper?
>
> I also know that Krisztián pointed out using docker-compose
> from Buildbot approach has some problems.
>
>
> https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E
>
> > Use docker-compose from ursabot?
> > --------------------------------
> >
> > So assume that we should use docker-compose commands in the buildbot
> > builders.
> > Then:
> > - there would be a single build step for all builders [2] (which means a
> >   single chunk of unreadable log) - it also complicates working with
> > esoteric
> >   builders like the on-demand crossbow trigger and the benchmark runner
> > - no possibility to customize the buildsteps (like aggregating the count
> of
> >   warnings)
> > - no time statistics for the steps which would make it harder to optimize
> > the
> >   build times
> > - to properly clean up the container some custom solution would be
> required
> > - if we'd need to introduce additional parametrizations to the
> >   docker-compose.yaml (for example to add other architectures) then it
> might
> >   require full yaml duplication
> > - exchanging data between the docker-compose container and builtbot
> would be
> >   more complicated, for example the benchmark comment reporter reads
> >   the result from a file, in order to do the same (reading structured
> > output on
> >   stdout and stderr from scripts is more error prone) mounted volumes are
> >   required, which brings the usual permission problems on linux.
> > - local reproducibility still requires manual intervention because the
> > scripts
> >   within the docker containers are not pausable, they exit and the steps
> > until
> >   the failed one must be re-executed* after ssh-ing into the running
> > container.
> > ...
> > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
>
> We can use "tail -f /dev/null", "docker-compose up -d cpp"
> and "docker-compose exec cpp .." to run commands step by
> step in container. It'll solve "single build step" related
> problems:
>
> Actually Buildbot's docker builder works similarly, it spins up an
image, starts a Buildbot worker inside, and instrument it from
outside.

> ---
> diff --git a/docker-compose.yml b/docker-compose.yml
> index 4f3f4128a..6b3218f5e 100644
> --- a/docker-compose.yml
> +++ b/docker-compose.yml
> @@ -114,6 +114,7 @@ services:
>      build:
>        context: .
>        dockerfile: cpp/Dockerfile
> +    command: tail -f /dev/null
>      volumes: *ubuntu-volumes
>
>    cpp-system-deps:
> ----
>
> ----
> % docker-compose up -d cpp
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> Creating network "arrowkou_default" with the default driver
> Creating arrowkou_cpp_1 ... done
> % docker-compose exec cpp sh -c 'echo hello > /tmp/hello.txt'
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> % docker-compose exec cpp cat /tmp/hello.txt
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> hello
> % docker-compose down
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> Stopping arrowkou_cpp_1 ... done
> Removing arrowkou_cpp_1 ... done
> Removing network arrowkou_default
> ----
>
I'm not saying that we couldn't nor shouldn't invest time into a
docker-compose
wrapper or perhaps a docker-compose generator, but this problem definitely
have a couple of angles to view from.

BTW I'm not sure how much time it took you to get familiar with the
buildbot
DSL, but your PR is just good as is, closely aligns with the previous
configs.

Thanks Kou!

>
>
> Thanks,
> --
> kou
>
> In <CAHM19a7ZBX5O8v6yZQkbWqg=HgGy4fTivXtg=k4zNra=fc1...@mail.gmail.com>
>   "[PROPOSAL] Consolidate Arrow's CI configuration" on Thu, 29 Aug 2019
> 14:19:16 +0200,
>   Krisztián Szűcs <[email protected]> wrote:
>
> > Hi,
> >
> > Arrow's current continuous integration setup utilizes multiple CI
> > providers,
> > tools, and scripts:
> >
> >  - Unit tests are running on Travis and Appveyor
> >  - Binary packaging builds are running on crossbow, an abstraction over
> > multiple
> >    CI providers driven through a GitHub repository
> >  - For local tests and tasks, there is a docker-compose setup, or of
> course
> > you
> >    can maintain your own environment
> >
> > This setup has run into some limitations:
> >  - It’s slow: the CI parallelism of Travis has degraded over the last
> > couple of
> >    months. Testing a PR takes more than an hour, which is a long time for
> > both
> >    the maintainers and the contributors, and it has a negative effect on
> > the
> >    development throughput.
> >  - Build configurations are not portable, they are tied to specific
> > services.
> >    You can’t just take a Travis script and run it somewhere else.
> >  - Because they’re not portable, build configurations are duplicated in
> > several
> >    places.
> >  - The Travis, Appveyor and crossbow builds are not reproducible locally,
> > so
> >    developing them requires the slow git push cycles.
> >  - Public CI has limited platform support, just for example ARM machines
> > are
> >    not available.
> >  - Public CI also has limited hardware support, no GPUs are available
> >
> > Resolving all of the issues above is complicated, but is a must for the
> > long
> > term sustainability of Arrow.
> >
> > For some time, we’ve been working on a tool called Ursabot[1], a library
> on
> > top
> > of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> > used
> > for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> > Buildbot
> > is not another hosted CI service like Travis or Appveyor: it is an
> > extensible
> > framework to implement various automations like continuous integration
> > tasks.
> >
> > You’ve probably noticed additional “Ursabot” builds appearing on pull
> > requests,
> > in addition to the Travis and Appveyor builds. We’ve been testing the
> > framework
> > with a fully featured CI server at ci.ursalabs.org. This service runs
> build
> > configurations we can’t run on Travis, does it faster than Travis, and
> has
> > the
> > GitHub comment bot integration for ad hoc build triggering.
> >
> > While we’re not prepared to propose moving all CI to a self-hosted setup,
> > our
> > work has demonstrated the potential of using buildbot to resolve Arrow’s
> > continuous integration challenges:
> >  - The docker-based builders are reusing the docker images, which
> eliminate
> >    slow dependency installation steps. Some builds on this setup, run on
> >    Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
> >    Travis-CI jobs.
> >  - It’s scalable. We can deploy buildbot wherever and add more masters
> and
> >    workers, which we can’t do with public CI.
> >  - It’s platform and CI-provider independent. Builds can be run on
> > arbitrary
> >    architectures, operating systems, and hardware: Python is the only
> >    requirement. Additionally builds specified in buildbot/ursabot can be
> > run
> >    anywhere: not only on custom buildbot infrastructure but also on
> Travis,
> > or
> >    even on your own machine.
> >  - It improves reproducibility and encourages consolidation of
> > configuration.
> >    You can run the exact job locally that ran on Travis, and you can even
> > get
> >    an interactive shell in the build so you can debug a test failure. And
> >    because you can run the same job anywhere, we wouldn’t need to have
> >    duplicated, Travis-specific or the docker-compose build configuration
> > stored
> >    separately.
> >  - It’s extensible. More exotic features like a comment bot, benchmark
> >    database, benchmark dashboard, artifact store, integrating other
> systems
> > are
> >    easily implementable within the same system.
> >
> > I’m proposing to donate the build configuration we’ve been iterating on
> in
> > Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> > configuration.
> > This will enable us to explore consolidating build configuration using
> the
> > buildbot framework. A next step after to explore that would be to port a
> > Travis
> > build to Ursabot, and in the Travis configuration, execute the build by
> the
> > shell command `$ ursabot project build <builder-name>`. This is the same
> > way we
> > would be able to execute the build locally--something we can’t currently
> do
> > with the Travis builds.
> >
> > I am not proposing here that we stop using Travis-CI and Appveyor to run
> CI
> > for
> > apache/arrow, though that may well be a direction we choose to go in the
> > future. Moving build configuration into something like buildbot would be
> a
> > necessary first step to do that; that said, there are other immediate
> > benefits
> > to be had by porting build configuration into buildbot: local
> > reproducibility,
> > consolidation of build logic, independence from a particular CI provider,
> > and
> > ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> > brings
> > a number of other challenges, which we will concurrently continue to
> > explore,
> > but we believe that there are benefits to adopting buildbot build
> > configuration
> > regardless.
> >
> > Regards, Krisztian
> >
> > [1]: https://github.com/ursa-labs/ursabot
> > [2]: https://buildbot.net
> >      https://docs.buildbot.net
> >      https://github.com/buildbot/buildbot
> > [3]: https://github.com/apache/arrow/pull/5210
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Reply via email to