Proposal: Redesign of configuration surface for all remote services

Tristan Van Berkom Sun, 29 Nov 2020 05:01:43 -0800

Hi all,

As I raised in my previous, still unresolved proposal earlier this
year[0], the configuration surface surrounding artifact caches, where,
when and what to pull or push, is highly confusing. In the current
state, it is hard to guess what will happen when combining certain
configurations.


So I want to raise this again but I think it is appropriate to widen
the scope of this proposal in order to cover the configuration data
pertaining to all remote services which can potentially be configured.

If I am not missing anything, these currently include:

  * Artifact content services (CAS)
  * Artifact indexing services (Remote Assets)
  * Source cache content services (CAS)
  * Source cache indexing services (Remote Assets)
  * Remote execution services

In essence, we've added so many switches in various places, all for
various purposes, but the resulting combinatorial complexity of all of
these options is largely unjustified.

I think we need to take a step back and look at the purposes for each
of these options and switches, reevaluate what is actually useful and
why, and come up with a more coherent picture, such that it becomes
obvious both to the user and the developer what will happen as a result
of any configuration combination.


Service configuration is user configuration
===========================================
As I did specify in my last email, I will reiterate that all of the
configuration pertaining to remote serives, their end points, whether
they are trusted, whether to carry assets forward across projects, all
of this is strictly user configuration.

We specifically extended the way user configuration can be augmented by
"Project Recommendations"[1] so that it was possible for a project to
"suggest" an artifact cache which is likely to contain artifacts for
the given project.

However, actual project data pertains to what and how something gets
built and will affect cache keys, and user configuration pertains to
things such as log file formatting, location of where to store
artifacts locally and remotely, and generally how the session runs, and
cannot ever affect cache keys.

User configuration constitutes the context in which BuildStream was
invoked, and BuildStream guarantees a deterministic build for the
project data regardless of context (ergo, the name of the Context
object where all invocation context is stored for the session).

Sorry for the extra emphasis here, I think it's important to stress
that these are separate configuration surfaces.


Use cases we want
=================
Here I will try to provide a birds eye view of what our use cases are,
what does a BuildStream client application require from these
services ?

  * The ability to store and retrieve artifacts on a remote artifact
    server.

  * The ability to store and retrieve staged source packages, indexed
    by source cache key, on remote source cache services.

  * The ability to farm out builds to a remote execution service

  * The ability to make requirements of worker instances on a remote
    execution service.

    - Possibly also the ability to bail out early if the remote
      execution service knows that it cannot provide a worker which
      with the properties which some of the project elements require.

  * Ability to have redundancies in configuration of remote servers, in
    case a service is down we usually allow configuration of services
    in list format.

  * Ability to carry artifacts forward from a third party artifact
    cache which was recommended by project configuration across a
    junction boundary.

    I.e. for better repeatability, it is often desirable to re-cache
    the artifacts from an upstream project on your own infra in order
    to ensure you have your own copy.

    NOTE: This is currently only available in project data and not
          overridable by user configuration in the form of the
          `cache-junction-elements`[2] configuration, which I already
          pointed out was problematic in my original report[0].

  * Ability to avoid downloading artifacts found on third party
    infrastructure.

    I.e. for better trustability, you may want to ensure that all of
    the built artifacts you end up consuming were built on
    infrastructure you control, rather than downloaded from an upstream
    project's artifact server.

    NOTE: This is currently only available in project data and not
          overridable by user configuration in the form of the 
          `ignore-junction-remotes`[2] configuration, which I already
          pointed out was problematic in my original report[0].

  * Ability to farm out any local caching work a remote service, to
    reduce uploads and downloads for builds when configured on an RE
    service (by way of specifying the RE service's CAS here),
    configuring multiple build machines which run without RE may also
    be optimized by way of using the same remote CAS for this.

    NOTE: This has not yet landed and is a part of Jürg's ongoing
    work[3].

  * Ability to clearly override the recommendation of any project.conf
    in the loaded pipeline using user configuration, which should
    always have the last word on any user configuration.


Use cases might not want
========================

  * Multiple RE services in a single session.

    Right now since we have the ability for a project itself to declare
    what remote execution service it intends to build on[4].

    This means that when loading a pipeline with multiple projects, the
    user should expect BuildStream to cope with a scenario where
    elements from a given project must be built on their own respective
    RE service

    I don't think anyone ever requested that BuildStream should cope
    with situations where the user instructs us to build some elements
    on one RE service and then it's reverse dependencies on other RE
    services, and I don't think anyone wants this weird feature.

  * Ability to "recommend" push remotes, or any remotes which would
    require write access on a remote service, using a project.conf
    recommendation.

    While it makes sense for read-only access, write access to a
    service will undoubtedly require a special private key/certificate
    in order to access.

    I think that if we are going to require the machine running
    BuildStream to have credentials anyway, then it makes sense to also
    require that the service be specified in a buildstream.conf file on
    that machine.

    Removing this configuration service would result in:

    - No more RE service recommendations in project.conf (an RE service
      AFAICS is only available with write access, using RE causes side
      effects on the remote and is a privileged operation).
    - No option to configure "push" in artifact cache recommendations
      in project.conf
    - No option to configure "push" in source cache recommendations in
      project.conf

  * Per project granularity in the configuration of what is currently
    called `cache-junction-elements` and `ignore-junction-remotes`[2].

    I think that the decisions made by these configurations are only
    pertinent to the toplevel project being built, but not pertinent
    to intermediate projects.

    If for instance, I have elements in projects (A) (B) and (C),
    configured as such:

       (A): Pull artifacts from https://project-a.com/cache

       (B): Pull and push artifacts at https://project-b.com/cache,
            and also cache elements from subproject (A) using the
            cache-junction-elements setting

       (C): Pull and push artifacts at https://project-c.com/cache,
            and also cache elements from subproject (B) using the
            cache-junction-elements setting

    What should be happening when I build (C), is that *all* elements
    from all subprojects should be re-cached at
    https://project-c.com/cache.

    What the current configuration structure dictates, is that when I
    build (C), then artifacts from project (A) will be re-cached in the
    artifact server belonging to project (B), but not in the artifact
    server belonging to project (C) unless (C) also explicitly defines
    a junction.

    Essentially, this re-caching structure should simply be a blanket
    statement that is applied to all subproject elements recursively,
    and the configuration should be entirely ignored in subprojects.

    The same essentially goes for `ignore-junction-remotes`, we only
    really care about this in the context of the toplevel project we
    are building, and how upstream projects deal with these things are
    their own decision.


Draft Format Proposal
=====================
Largely reposted from the original proposal[0], I propose that we
remove junction level configuration[2] of artifact server behaviors
completely, and replace this with a project.conf recommended behavior
which can be overridden by buildstream.conf.

Furthermore, as per the above, I would propose that we drop the "push"
option completely from project.conf.

For the project and user configuration, I would suggest that we break
the format such that `artifacts` becomes a dictionary with a `servers`
list, adding the two extra configurations on that dictionary, e.g.:

    #
    # Artifacts
    #
    artifacts:

      #
      # Whether to try pull artifacts from artifact
      # caches recommended by subprojects.
      #
      # XXX Arguably, I would like to remove this completely, even
      #
      pull-subproject-artifacts: True

      #
      # Whether to push subproject artifacts into
      # the following servers.
      #
      push-subprojects-artifacts: True

      #
      # Same old server list as before, this is a per project hint
      # of where to pull artifacts from (or perhaps push to if you
      # have credentials).
      #
      servers:
      - url: https://artifacts.com/artifacts:11001
        server-cert: server.crt


When expressed in project configuration, I would clarify that the two
added settings are only considered as a default behavior when building
this project as a toplevel project, that they are completely ignored
when building the project as a subproject, and that these can be
overridden
by user configuration.

Further, I would propose that we clarify in the user configuration
documentation that the per-project settings regarding artifact caches
are NOT taken into consideration separately when building a project
which junctions another project in your list.

For example:

    #
    # My buildstream.conf
    #
    projects:
      foo:
        artifacts:
          pull-subproject-artifacts: False
      bar:
        artifacts:
          pull-subproject-artifacts: True

If there is a junction between `foo` and `bar`, in either direction,
there is no conflict here to resolve or differing behavior on a per
project basis: Only the toplevel project is ever considered to
determine artifact caching behaviors here.


Further proposed changes
========================
In addition to actual format changes, I would propose the following:

  * Remove the ability of the project.conf to recommend remote
    execution services, as access to a remote execution service is
    already privileged.

    Furthermore, connection to a remote execution service is really
    an attribute of a session and nothing to do with the project
    itself, it should apply to the whole session equally and there
    should be no doubt that only one RE service will be accessed in the
    session.

  * Service lists should override eachother, rather than extend
    eachother.

    In the user configuration we have the following text[5]:

      "Although project’s often specify a remote artifact cache in
       their project.conf, you may also want to specify extra caches."

    This text suggests that a list of artifact services recommended by
    the project.conf of a given project continues to be observed, even
    after being overridden in user configuration.

    I think this goes against the spirit of our configuration style,
    and the recommendations of the loaded project.conf should be
    overridable rather than merely extensible.

    Even with the currently existing note:

      "Caches listed here will be considered a higher priority than
       those specified by the project..."

    This leaves us with no way to completely override the
    recommendation of the project.

    Personally, I am more comfortable with the certainty that I have
    control to override these, than I am with the convenience of
    falling back on project recommendations, although if there are
    strong opposing feelings here, we may need to support both the
    override and extend semantics separately.

  * Global service lists override projects instead of act as a fallback

    Currently also specified in the user configuration docs, is:

      "Caches declared here will be used by all BuildStream project’s
       on the user’s machine and are considered a lower priority than
       those specified in the project configuration."

    I think instead, the global list should override anything specified
    by any project.conf, except for services specified on a per project
    basis in the user configuration.

    NOTE: In the previous section I commented on 
          "pull-subproject-artifacts", this can completely be removed,
          as the same behaviour can be achieved simply by providing an
          empty service list for the global artifact services
          configured in buildstream.conf.


I realize there is a lot to process here, it is a complete teardown and
rewrite of user configuration surface for services. I would like to
hear your input and possibly alternative recommendations.

While it may not end up exactly as I described above, I am convinced
that some urgent house cleaning is necessary.

Cheers,
    -Tristan


[0]: https://mail.gnome.org/archives/buildstream-list/2020-May/msg00018.html
[1]: https://docs.buildstream.build/master/arch_data_model.html#context
[2]: https://docs.buildstream.build/master/elements/junction.html
[3]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/2095
[4]: https://docs.buildstream.build/master/format_project.html#remote-execution
[5]: https://docs.buildstream.build/master/using_config.html#artifact-server

Proposal: Redesign of configuration surface for all remote services

Reply via email to