Re: Proposal: Redesign of configuration surface for all remote services

Tristan Van Berkom Thu, 28 Jan 2021 00:12:48 -0800

Hi all,

Since there was no activity on this thread for a very long time, I
decided to go ahead and take a crack at this.


I have a good branch now that is ready for review. The MR is up here: 
https://github.com/apache/buildstream/pull/1453


I'm sending a detailed email because it's a large proposal and I would
like this to be visible, so that people can chime in incase we've
missed an important use case.

Cheers,
    -Tristan


Here is the design/changes I've come up with.
=============================================


The offending junction configurations
-------------------------------------
The junction configurations:

  "cache-junction-elements"
  "ignore-junction-remotes"

Are completely removed, reducing the worrisome ambiguity of what
happens in what configuration.

This is replaced by the enhanced user configuration.


Authentication
--------------
For all of the authentication related properties, `server-cert`,
`client-cert` and `client-key`, these have been split out into a
subdictionary named "auth" for any remote configuration.

This may allow better extensibility for alternative authentication
methods in the future, however right now it serves us very well to be
able to document the "auth" dictionary in one central place in the
documentation.


Remote Execution Configuration
------------------------------
As described in this thread, this is now only configurable with user
configuration, and only one "remote-execution" block is ever
considered.

There is no longer any ambiguity here, "remote-execution" applies to an
entire session, one build session cannot be built across multiple
different remote execution build clusters.


Artifact and Source cache configuration
---------------------------------------
Projects are still allowed to provide recommendations for artifact and
source cache servers.

User configuration now has the ability to override them, i.e. disregard
artifact and source cache servers declared in projects.

Also, it is no longer possible to declare an artifact/source cache
server as a dictionary, it MUST be a list.

This choice is simply because it the dict-or-list tactic here does not
buy us any convenience whatsoever, and clarity that it is in fact a
list of dictionaries is more worthwhile.

Consider:
~~~~~~~~~

  artifacts:
    url: https://pony.moose/zebra:4040
    push: true

Versus:
~~~~~~~

  artifacts:
  - url: https://pony.moose/zebra:4040
    push: true

It is exactly the same amount of typing, no point in supporting both
here.


Project Configuration
~~~~~~~~~~~~~~~~~~~~~

  #
  # This is mostly unchanged, except for the `auth`
  #
  artifacts:
  - url: https://pony.com:9999
    type: both
    push: false
    instance-name: this-shard
    auth:
      server-cert: server.crt


User Configuration
~~~~~~~~~~~~~~~~~~
We can declare global artifact configuration, which either
overrides or augments project recommended cache servers.

When "augmenting", the user configuration is still at a higher priority
than the project recommendations (as in: user configuration caches will
be consulted *first* when interacting with remotes).


  #
  # Global artifact configuration
  #
  artifacts:

    #
    # Here we decide whether user configuration overrides
    # project recommendations.
    #
    override-project-caches: true

    #
    # And we declare the global artifact configurations
    # under the new "servers" sub-dictionary instead
    #
    servers:
    - url: https://pony.com:9999
      type: both
      push: true
      instance-name: this-shard
      auth:
        server-cert: server.crt
        client-key: client.key
        client-cert: client.crt


We can still declare artifact configuration in the overrides, with
exactly the same new configuration:

  #
  # Artifact configuration for project "foo"
  #
  projects:
    foo:

      artifacts:
        #
        # Lets completely override the cache for only project "foo"
        #
        override-project-caches: true

        #
        # And declare the servers here
        #
        servers:
        - url: https://pony.com:9999
          type: both
          push: true
          instance-name: this-shard
          auth:
            server-cert: server.crt
            client-key: client.key
            client-cert: client.crt


Use case overview
=================
Here is an overview of the previously discussed desirable use cases.

Inline responses to my initial proposal:

[...]
> Use cases we want
> =================
> Here I will try to provide a birds eye view of what our use cases are,
> what does a BuildStream client application require from these
> services ?
> 
>   * The ability to store and retrieve artifacts on a remote artifact
>     server.

Of course.

>   * The ability to store and retrieve staged source packages, indexed
>     by source cache key, on remote source cache services.

Of course.

>   * The ability to farm out builds to a remote execution service

Depending only on your ability to setup a remote execution build
cluster, of course.

>   * The ability to make requirements of worker instances on a remote
>     execution service.
> 
>     - Possibly also the ability to bail out early if the remote
>       execution service knows that it cannot provide a worker which
>       with the properties which some of the project elements require.

Nothing changes here thus far, although that is not to say we are
perfect in this regard yet, but this patch does not effect this.

>   * Ability to have redundancies in configuration of remote servers, in
>     case a service is down we usually allow configuration of services
>     in list format.

We still have this.

>   * Ability to carry artifacts forward from a third party artifact
>     cache which was recommended by project configuration across a
>     junction boundary.
> 
>     I.e. for better repeatability, it is often desirable to re-cache
>     the artifacts from an upstream project on your own infra in order
>     to ensure you have your own copy.
> 
>     NOTE: This is currently only available in project data and not
>           overridable by user configuration in the form of the
>           `cache-junction-elements`[2] configuration, which I already
>           pointed out was problematic in my original report[0].

With this patch, we can achieve this use case by configuring a global
cache server for pushing, and setting `override-project-caches` to
`false`.

The result will be:

  * When pulling, we will:

    - First try to pull from the globally defined cache servers
    - Fall back on project defined cache servers

  * When pushing, we will:

    - First push to our globally defined cache servers
    - Not push to the project defined cache servers

Why will we not push to the project defined cache servers ?

Well, it is currently only a consequence of the simple fact that
normally people simply do not configure "push" servers in a
project.conf, as that would likely imply that they are publishing the
private key needed to push to their server along with their project, so
that everyone and their dog can push anything they like to the artifact
server.

We could additionally police this in the code and completely disallow
such nonsense configurations, but for now I've just left a fat notice
in the project.conf documentation which points out that is is a very
bad idea to configure a "push" remote from your project.conf (following
the "let people shoot their own feet if they really want to" policy).

>   * Ability to avoid downloading artifacts found on third party
>     infrastructure.
> 
>     I.e. for better trustability, you may want to ensure that all of
>     the built artifacts you end up consuming were built on
>     infrastructure you control, rather than downloaded from an upstream
>     project's artifact server.
> 
>     NOTE: This is currently only available in project data and not
>           overridable by user configuration in the form of the 
>           `ignore-junction-remotes`[2] configuration, which I already
>           pointed out was problematic in my original report[0].

This is now possible by simply declaring `override-project-caches` to
`true` in the global configuration, regardless of whether or not you
have provided any remotes in your global configuration.

>   * Ability to farm out any local caching work a remote service, to
>     reduce uploads and downloads for builds when configured on an RE
>     service (by way of specifying the RE service's CAS here),
>     configuring multiple build machines which run without RE may also
>     be optimized by way of using the same remote CAS for this.
> 
>     NOTE: This has not yet landed and is a part of Jürg's ongoing
>     work[3].

This orthogonal feature has yet to land in master.

>   * Ability to clearly override the recommendation of any project.conf
>     in the loaded pipeline using user configuration, which should
>     always have the last word on any user configuration.

This can be achieved on a per-project bases by setting the new
`override-project-caches` attribute to `true` in the overrides section
of the user configuration.

Re: Proposal: Redesign of configuration surface for all remote services

Reply via email to