Hi all,
As I raised in my previous, still unresolved proposal earlier this
year[0], the configuration surface surrounding artifact caches, where,
when and what to pull or push, is highly confusing. In the current
state, it is hard to guess what will happen when combining certain
configurations.
So I want to raise this again but I think it is appropriate to widen
the scope of this proposal in order to cover the configuration data
pertaining to all remote services which can potentially be configured.
If I am not missing anything, these currently include:
* Artifact content services (CAS)
* Artifact indexing services (Remote Assets)
* Source cache content services (CAS)
* Source cache indexing services (Remote Assets)
* Remote execution services
In essence, we've added so many switches in various places, all for
various purposes, but the resulting combinatorial complexity of all of
these options is largely unjustified.
I think we need to take a step back and look at the purposes for each
of these options and switches, reevaluate what is actually useful and
why, and come up with a more coherent picture, such that it becomes
obvious both to the user and the developer what will happen as a result
of any configuration combination.
Service configuration is user configuration
===========================================
As I did specify in my last email, I will reiterate that all of the
configuration pertaining to remote serives, their end points, whether
they are trusted, whether to carry assets forward across projects, all
of this is strictly user configuration.
We specifically extended the way user configuration can be augmented by
"Project Recommendations"[1] so that it was possible for a project to
"suggest" an artifact cache which is likely to contain artifacts for
the given project.
However, actual project data pertains to what and how something gets
built and will affect cache keys, and user configuration pertains to
things such as log file formatting, location of where to store
artifacts locally and remotely, and generally how the session runs, and
cannot ever affect cache keys.
User configuration constitutes the context in which BuildStream was
invoked, and BuildStream guarantees a deterministic build for the
project data regardless of context (ergo, the name of the Context
object where all invocation context is stored for the session).
Sorry for the extra emphasis here, I think it's important to stress
that these are separate configuration surfaces.
Use cases we want
=================
Here I will try to provide a birds eye view of what our use cases are,
what does a BuildStream client application require from these
services ?
* The ability to store and retrieve artifacts on a remote artifact
server.
* The ability to store and retrieve staged source packages, indexed
by source cache key, on remote source cache services.
* The ability to farm out builds to a remote execution service
* The ability to make requirements of worker instances on a remote
execution service.
- Possibly also the ability to bail out early if the remote
execution service knows that it cannot provide a worker which
with the properties which some of the project elements require.
* Ability to have redundancies in configuration of remote servers, in
case a service is down we usually allow configuration of services
in list format.
* Ability to carry artifacts forward from a third party artifact
cache which was recommended by project configuration across a
junction boundary.
I.e. for better repeatability, it is often desirable to re-cache
the artifacts from an upstream project on your own infra in order
to ensure you have your own copy.
NOTE: This is currently only available in project data and not
overridable by user configuration in the form of the
`cache-junction-elements`[2] configuration, which I already
pointed out was problematic in my original report[0].
* Ability to avoid downloading artifacts found on third party
infrastructure.
I.e. for better trustability, you may want to ensure that all of
the built artifacts you end up consuming were built on
infrastructure you control, rather than downloaded from an upstream
project's artifact server.
NOTE: This is currently only available in project data and not
overridable by user configuration in the form of the
`ignore-junction-remotes`[2] configuration, which I already
pointed out was problematic in my original report[0].
* Ability to farm out any local caching work a remote service, to
reduce uploads and downloads for builds when configured on an RE
service (by way of specifying the RE service's CAS here),
configuring multiple build machines which run without RE may also
be optimized by way of using the same remote CAS for this.
NOTE: This has not yet landed and is a part of Jürg's ongoing
work[3].
* Ability to clearly override the recommendation of any project.conf
in the loaded pipeline using user configuration, which should
always have the last word on any user configuration.
Use cases might not want
========================
* Multiple RE services in a single session.
Right now since we have the ability for a project itself to declare
what remote execution service it intends to build on[4].
This means that when loading a pipeline with multiple projects, the
user should expect BuildStream to cope with a scenario where
elements from a given project must be built on their own respective
RE service
I don't think anyone ever requested that BuildStream should cope
with situations where the user instructs us to build some elements
on one RE service and then it's reverse dependencies on other RE
services, and I don't think anyone wants this weird feature.
* Ability to "recommend" push remotes, or any remotes which would
require write access on a remote service, using a project.conf
recommendation.
While it makes sense for read-only access, write access to a
service will undoubtedly require a special private key/certificate
in order to access.
I think that if we are going to require the machine running
BuildStream to have credentials anyway, then it makes sense to also
require that the service be specified in a buildstream.conf file on
that machine.
Removing this configuration service would result in:
- No more RE service recommendations in project.conf (an RE service
AFAICS is only available with write access, using RE causes side
effects on the remote and is a privileged operation).
- No option to configure "push" in artifact cache recommendations
in project.conf
- No option to configure "push" in source cache recommendations in
project.conf
* Per project granularity in the configuration of what is currently
called `cache-junction-elements` and `ignore-junction-remotes`[2].
I think that the decisions made by these configurations are only
pertinent to the toplevel project being built, but not pertinent
to intermediate projects.
If for instance, I have elements in projects (A) (B) and (C),
configured as such:
(A): Pull artifacts from https://project-a.com/cache
(B): Pull and push artifacts at https://project-b.com/cache,
and also cache elements from subproject (A) using the
cache-junction-elements setting
(C): Pull and push artifacts at https://project-c.com/cache,
and also cache elements from subproject (B) using the
cache-junction-elements setting
What should be happening when I build (C), is that *all* elements
from all subprojects should be re-cached at
https://project-c.com/cache.
What the current configuration structure dictates, is that when I
build (C), then artifacts from project (A) will be re-cached in the
artifact server belonging to project (B), but not in the artifact
server belonging to project (C) unless (C) also explicitly defines
a junction.
Essentially, this re-caching structure should simply be a blanket
statement that is applied to all subproject elements recursively,
and the configuration should be entirely ignored in subprojects.
The same essentially goes for `ignore-junction-remotes`, we only
really care about this in the context of the toplevel project we
are building, and how upstream projects deal with these things are
their own decision.
Draft Format Proposal
=====================
Largely reposted from the original proposal[0], I propose that we
remove junction level configuration[2] of artifact server behaviors
completely, and replace this with a project.conf recommended behavior
which can be overridden by buildstream.conf.
Furthermore, as per the above, I would propose that we drop the "push"
option completely from project.conf.
For the project and user configuration, I would suggest that we break
the format such that `artifacts` becomes a dictionary with a `servers`
list, adding the two extra configurations on that dictionary, e.g.:
#
# Artifacts
#
artifacts:
#
# Whether to try pull artifacts from artifact
# caches recommended by subprojects.
#
# XXX Arguably, I would like to remove this completely, even
#
pull-subproject-artifacts: True
#
# Whether to push subproject artifacts into
# the following servers.
#
push-subprojects-artifacts: True
#
# Same old server list as before, this is a per project hint
# of where to pull artifacts from (or perhaps push to if you
# have credentials).
#
servers:
- url: https://artifacts.com/artifacts:11001
server-cert: server.crt
When expressed in project configuration, I would clarify that the two
added settings are only considered as a default behavior when building
this project as a toplevel project, that they are completely ignored
when building the project as a subproject, and that these can be
overridden
by user configuration.
Further, I would propose that we clarify in the user configuration
documentation that the per-project settings regarding artifact caches
are NOT taken into consideration separately when building a project
which junctions another project in your list.
For example:
#
# My buildstream.conf
#
projects:
foo:
artifacts:
pull-subproject-artifacts: False
bar:
artifacts:
pull-subproject-artifacts: True
If there is a junction between `foo` and `bar`, in either direction,
there is no conflict here to resolve or differing behavior on a per
project basis: Only the toplevel project is ever considered to
determine artifact caching behaviors here.
Further proposed changes
========================
In addition to actual format changes, I would propose the following:
* Remove the ability of the project.conf to recommend remote
execution services, as access to a remote execution service is
already privileged.
Furthermore, connection to a remote execution service is really
an attribute of a session and nothing to do with the project
itself, it should apply to the whole session equally and there
should be no doubt that only one RE service will be accessed in the
session.
* Service lists should override eachother, rather than extend
eachother.
In the user configuration we have the following text[5]:
"Although project’s often specify a remote artifact cache in
their project.conf, you may also want to specify extra caches."
This text suggests that a list of artifact services recommended by
the project.conf of a given project continues to be observed, even
after being overridden in user configuration.
I think this goes against the spirit of our configuration style,
and the recommendations of the loaded project.conf should be
overridable rather than merely extensible.
Even with the currently existing note:
"Caches listed here will be considered a higher priority than
those specified by the project..."
This leaves us with no way to completely override the
recommendation of the project.
Personally, I am more comfortable with the certainty that I have
control to override these, than I am with the convenience of
falling back on project recommendations, although if there are
strong opposing feelings here, we may need to support both the
override and extend semantics separately.
* Global service lists override projects instead of act as a fallback
Currently also specified in the user configuration docs, is:
"Caches declared here will be used by all BuildStream project’s
on the user’s machine and are considered a lower priority than
those specified in the project configuration."
I think instead, the global list should override anything specified
by any project.conf, except for services specified on a per project
basis in the user configuration.
NOTE: In the previous section I commented on
"pull-subproject-artifacts", this can completely be removed,
as the same behaviour can be achieved simply by providing an
empty service list for the global artifact services
configured in buildstream.conf.
I realize there is a lot to process here, it is a complete teardown and
rewrite of user configuration surface for services. I would like to
hear your input and possibly alternative recommendations.
While it may not end up exactly as I described above, I am convinced
that some urgent house cleaning is necessary.
Cheers,
-Tristan
[0]: https://mail.gnome.org/archives/buildstream-list/2020-May/msg00018.html
[1]: https://docs.buildstream.build/master/arch_data_model.html#context
[2]: https://docs.buildstream.build/master/elements/junction.html
[3]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/2095
[4]: https://docs.buildstream.build/master/format_project.html#remote-execution
[5]: https://docs.buildstream.build/master/using_config.html#artifact-server