Re: [PROPOSAL] Consolidate Arrow's CI configuration

Sutou Kouhei Sat, 07 Sep 2019 00:54:05 -0700

Hi,

I may have Ursabot experience because I've tried to create a
Ursabot configuration for GLib:


  https://github.com/ursa-labs/ursabot/pull/172

I like the proposal to consolidate CI configuration into
Arrow repository. But I like the current docker-compose
based approach to write how to run each CI job.

I know that Krisztián pointed out docker-compose based
approach has a problem in Docker image dependency
resolution.

  
https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E

> The "docker-compose setup"
> --------------------------
> ...
> However docker-compose is not suitable for building and running
> hierarchical
> images. This is why we have added Makefile [1] to execute a "build" with a
> single make command instead of manually executing multiple commands
> involving
> multiple images (which is error prone). It can also leave a lot of garbage
> after both containers and images.
> ...
> [1]: https://github.com/apache/arrow/blob/master/Makefile.docker

But I felt that I want to use well used approach than our
specific Python based DSL while I created c_glib
configuration for Ursabot. If we can use well used approach,
we can use the approach in other projects. It means that we
can decrease learning cost.

I also felt that creating a class for each command for
readable DSL is over-engineering. I know that I could use
raw ShellCommand class but it'll break consistency.

For example:

  Creating Meson class to run meson command:
  
https://github.com/ursa-labs/ursabot/pull/172/files#diff-663dab3e9eab42dfac85d2fdb69c7e95R313-R315

How about just creating a wrapper script for docker-compose
instead of creating DSL?

For example, we will be able to use labels [labels] to put
metadata to each image:

----
diff --git a/arrow-docker-compose b/arrow-docker-compose
new file mode 100755
index 000000000..fcb7f5e37
--- /dev/null
+++ b/arrow-docker-compose
@@ -0,0 +1,13 @@
+#!/usr/bin/env ruby
+
+require "yaml"
+
+if ARGV == ["build", "c_glib"]
+  config = YAML.load(File.read("docker-compose.yml"))
+  from = 
config["services"]["c_glib"]["build"]["labels"]["org.apache.arrow.from"]
+  if from
+    system("docker-compose", "build", from)
+  end
+end
+system("docker-compose", *ARGV)
diff --git a/docker-compose.yml b/docker-compose.yml
index 4f3f4128a..acd649a19 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -103,6 +103,8 @@ services:
     build:
       context: .
       dockerfile: c_glib/Dockerfile
+      labels:
+        "org.apache.arrow.from": cpp
     volumes: *ubuntu-volumes
 
   cpp:
----

"./arrow-docker-compose build c_glib" runs
"docker-compose build cpp" then
"docker-compose build c_glib".

[labels] https://docs.docker.com/compose/compose-file/#labels


If we just have convenient docker-compose wrapper, can we
use raw Buildbot that just runs the docker-compose wrapper?

I also know that Krisztián pointed out using docker-compose
from Buildbot approach has some problems.

  
https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E

> Use docker-compose from ursabot?
> --------------------------------
>
> So assume that we should use docker-compose commands in the buildbot
> builders.
> Then:
> - there would be a single build step for all builders [2] (which means a
>   single chunk of unreadable log) - it also complicates working with
> esoteric
>   builders like the on-demand crossbow trigger and the benchmark runner
> - no possibility to customize the buildsteps (like aggregating the count of
>   warnings)
> - no time statistics for the steps which would make it harder to optimize
> the
>   build times
> - to properly clean up the container some custom solution would be required
> - if we'd need to introduce additional parametrizations to the
>   docker-compose.yaml (for example to add other architectures) then it might
>   require full yaml duplication
> - exchanging data between the docker-compose container and builtbot would be
>   more complicated, for example the benchmark comment reporter reads
>   the result from a file, in order to do the same (reading structured
> output on
>   stdout and stderr from scripts is more error prone) mounted volumes are
>   required, which brings the usual permission problems on linux.
> - local reproducibility still requires manual intervention because the
> scripts
>   within the docker containers are not pausable, they exit and the steps
> until
>   the failed one must be re-executed* after ssh-ing into the running
> container.
> ...
> [2]: https://ci.ursalabs.org/#/builders/87/builds/929

We can use "tail -f /dev/null", "docker-compose up -d cpp"
and "docker-compose exec cpp .." to run commands step by
step in container. It'll solve "single build step" related
problems:

---
diff --git a/docker-compose.yml b/docker-compose.yml
index 4f3f4128a..6b3218f5e 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -114,6 +114,7 @@ services:
     build:
       context: .
       dockerfile: cpp/Dockerfile
+    command: tail -f /dev/null
     volumes: *ubuntu-volumes
 
   cpp-system-deps:
----

----
% docker-compose up -d cpp
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
Creating network "arrowkou_default" with the default driver
Creating arrowkou_cpp_1 ... done
% docker-compose exec cpp sh -c 'echo hello > /tmp/hello.txt'
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
% docker-compose exec cpp cat /tmp/hello.txt
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
hello
% docker-compose down
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
Stopping arrowkou_cpp_1 ... done
Removing arrowkou_cpp_1 ... done
Removing network arrowkou_default
----


Thanks,
--
kou

In <CAHM19a7ZBX5O8v6yZQkbWqg=HgGy4fTivXtg=k4zNra=fc1...@mail.gmail.com>
  "[PROPOSAL] Consolidate Arrow's CI configuration" on Thu, 29 Aug 2019 
14:19:16 +0200,
  Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote:

> Hi,
> 
> Arrow's current continuous integration setup utilizes multiple CI
> providers,
> tools, and scripts:
> 
>  - Unit tests are running on Travis and Appveyor
>  - Binary packaging builds are running on crossbow, an abstraction over
> multiple
>    CI providers driven through a GitHub repository
>  - For local tests and tasks, there is a docker-compose setup, or of course
> you
>    can maintain your own environment
> 
> This setup has run into some limitations:
>  - It’s slow: the CI parallelism of Travis has degraded over the last
> couple of
>    months. Testing a PR takes more than an hour, which is a long time for
> both
>    the maintainers and the contributors, and it has a negative effect on
> the
>    development throughput.
>  - Build configurations are not portable, they are tied to specific
> services.
>    You can’t just take a Travis script and run it somewhere else.
>  - Because they’re not portable, build configurations are duplicated in
> several
>    places.
>  - The Travis, Appveyor and crossbow builds are not reproducible locally,
> so
>    developing them requires the slow git push cycles.
>  - Public CI has limited platform support, just for example ARM machines
> are
>    not available.
>  - Public CI also has limited hardware support, no GPUs are available
> 
> Resolving all of the issues above is complicated, but is a must for the
> long
> term sustainability of Arrow.
> 
> For some time, we’ve been working on a tool called Ursabot[1], a library on
> top
> of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> used
> for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> Buildbot
> is not another hosted CI service like Travis or Appveyor: it is an
> extensible
> framework to implement various automations like continuous integration
> tasks.
> 
> You’ve probably noticed additional “Ursabot” builds appearing on pull
> requests,
> in addition to the Travis and Appveyor builds. We’ve been testing the
> framework
> with a fully featured CI server at ci.ursalabs.org. This service runs build
> configurations we can’t run on Travis, does it faster than Travis, and has
> the
> GitHub comment bot integration for ad hoc build triggering.
> 
> While we’re not prepared to propose moving all CI to a self-hosted setup,
> our
> work has demonstrated the potential of using buildbot to resolve Arrow’s
> continuous integration challenges:
>  - The docker-based builders are reusing the docker images, which eliminate
>    slow dependency installation steps. Some builds on this setup, run on
>    Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
>    Travis-CI jobs.
>  - It’s scalable. We can deploy buildbot wherever and add more masters and
>    workers, which we can’t do with public CI.
>  - It’s platform and CI-provider independent. Builds can be run on
> arbitrary
>    architectures, operating systems, and hardware: Python is the only
>    requirement. Additionally builds specified in buildbot/ursabot can be
> run
>    anywhere: not only on custom buildbot infrastructure but also on Travis,
> or
>    even on your own machine.
>  - It improves reproducibility and encourages consolidation of
> configuration.
>    You can run the exact job locally that ran on Travis, and you can even
> get
>    an interactive shell in the build so you can debug a test failure. And
>    because you can run the same job anywhere, we wouldn’t need to have
>    duplicated, Travis-specific or the docker-compose build configuration
> stored
>    separately.
>  - It’s extensible. More exotic features like a comment bot, benchmark
>    database, benchmark dashboard, artifact store, integrating other systems
> are
>    easily implementable within the same system.
> 
> I’m proposing to donate the build configuration we’ve been iterating on in
> Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> configuration.
> This will enable us to explore consolidating build configuration using the
> buildbot framework. A next step after to explore that would be to port a
> Travis
> build to Ursabot, and in the Travis configuration, execute the build by the
> shell command `$ ursabot project build <builder-name>`. This is the same
> way we
> would be able to execute the build locally--something we can’t currently do
> with the Travis builds.
> 
> I am not proposing here that we stop using Travis-CI and Appveyor to run CI
> for
> apache/arrow, though that may well be a direction we choose to go in the
> future. Moving build configuration into something like buildbot would be a
> necessary first step to do that; that said, there are other immediate
> benefits
> to be had by porting build configuration into buildbot: local
> reproducibility,
> consolidation of build logic, independence from a particular CI provider,
> and
> ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> brings
> a number of other challenges, which we will concurrently continue to
> explore,
> but we believe that there are benefits to adopting buildbot build
> configuration
> regardless.
> 
> Regards, Krisztian
> 
> [1]: https://github.com/ursa-labs/ursabot
> [2]: https://buildbot.net
>      https://docs.buildbot.net
>      https://github.com/buildbot/buildbot
> [3]: https://github.com/apache/arrow/pull/5210

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Reply via email to