Hi,
I'm afraid this is going to be an earful, it seems that you've awoken
the beast and cracked open a can of worms here...
On Thu, 2020-07-30 at 17:34 +0200, Jürg Billeter wrote:
> Hi Tristan,
>
> On Tue, 2020-07-28 at 12:36 +0900, Tristan Van Berkom wrote:
> > On Tue, 2020-07-28 at 02:11 +0900, Tristan Van Berkom wrote:
> > > This is an interesting choice I'm not sure I have an answer to today,
> > > should we only allow parsing custom YAML on build dependencies, and
> > > call it an error to specify `config` on a dependency that is strictly a
> > > `runtime` dependency ?
> >
> > After sleeping on this I feel more strongly in favor of allowing this
> > on all dependencies regardless of `type`.
> >
> > The worst case scenario is that it happens to not be used with runtime-
> > only dependencies much.
>
> The cache key does not change when there is a change in direct runtime
> dependencies. Only build and indirect runtime dependencies affect the
> cache key. Unless we want to change that, we should clearly not pass on
> any information about direct runtime dependencies to that new method,
> as direct runtime dependencies are not allowed to affect the build of
> that element.
>
> I.e., I think the worst case scenario is that this _is_ used with
> runtime-only dependencies and it affects builds without affecting the
> cache key. Or am I missing something?
Interesting... here is my initial impression:
I think that if a build dependency is configured in such a way that
it affects build output, then the plugin must consider that in it's
`Plugin.get_unique_key()` implementation.
I don't see why this would not be the case if you had configured a
runtime dependency specifically, such that it might configure the
build with an expectation of how that runtime dependency will
behave later on.
That said, opinions might differ and this is certainly worth
investigating further...
Perhaps there are two ways to consider what runtime dependencies are...
A.) Runtime dependency *artifacts* are required at runtime of
the artifact being built, but not at build time.
This has always been true and should continue to be true.
B.) Runtime dependency *elements* are not allowed to affect the
cache key of the depending element in any way.
This is not currently true, and I think whether it becomes true
is up for debate, it may be a good thing to make this true.
One thing is clear: *artifacts* are not the only inputs provided by an
element, an element also provides public data, and in the future, an
element is likely to provide it's sources to reverse dependencies
(depending on the sources of a dependency has resurfaced multiple times
as useful for some constructs, like testing elements, crunching
manifests, or perhaps running static code analyzers).
The fact that (B) is simply "untrue", is evidenced by the
Element.dependencies() API allowing the Element to access runtime
dependencies using `Scope.ALL` or `Scope.RUNTIME` (or similar with
the current Element.search()).
My current expectation is be that the following construct is possible
and perfectly valid:
* An element reads public data from one or more runtime-only
dependencies
* The said element is built according to the public data found
on runtime-only dependencies.
This can mean that it is built with a pre-configured expectation
of how a given dependency will behave at runtime (the dependency
*artifact* however is still not required at build time).
* The plugin adjusts Plugin.get_unique_key() depending on any
variance that the Plugin is in control of, including this public
data if it affects the build.
I think that, if we have given the Element access to it's runtime
dependencies (even if their artifacts cannot be *staged*), then we
cannot logically say that these dependency elements are not allowed to
be input, or affect build output/cache key.
So up until here, I disagree...
Below I will open another adjacent can of worms, and try to present a
couple of possible courses of action...
Can of worms: Mutable public data
=================================
Since we recently discussed the problem of plugins generating data in
host python which is then used to write directly to the sandbox, I've
also been alluding to the probability that we should remove the ability
of Elements to mutate public data in code (I've raised it in passing a
couple of times but nothing formal).
This is originally because we cannot guarantee consistency of data
generated in host python like we can with parsed YAML. Dictionary
ordering can be a concern, even if we assume host python dictionary
stability, the input of those dictionaries composed by a plugin can be
obtained by some non-stable source; this can also be true for "lists"
where the plugin author might judge that list order is not significant.
But in light of the current topic, mutable public data becomes much
more problematic, considering the following logic:
* An element is granted access to all of it's dependencies, including
runtime-only dependencies.
* Public data is mutable, which means that the public data found
on an element is not guaranteed to be the same data before and
after it is built.
* When reading the public data of a runtime-only dependency, we
don't know if that dependency has been built yet or not, and
we might get different public data before or after that dependency
has been built.
Considering this, we currently have a clear case of non-determinism
in any case which an element reads public data from a runtime-only
dependency.
A fork in the road
==================
I can see this going in two general directions.
One direction is that we make public data immutable again, which is
something I've been wanting to do anyway, possibly allowing an
Element's default YAML to also augment the element public data,
allowing support for some basic constructs like the one Chandan
suggested in the adjacent sub-thread regarding plugin identities[0].
The other direction would be to remove runtime dependencies from an
Element's input, which is to say that we would remove the ability of
accessing them in `Element.dependencies()` or `Element.search()` (or
the replacement which this proposal would replace `Element.search()`
with).
Let's try to zoom in and take a closer look at what each of these would
look like...
Solution 1: Immutable public data
---------------------------------
* This approach would also patch up the vulnerability of allowing
python code to generate structural data which will be interpreted
by reverse dependencies, keeping control of this data entirely in
YAML and BuildStream Node composition.
* For this proposal, the `Element.configure_dependencies()` can still
be allowed to access runtime-only dependencies.
* Further, we can continue to grant visibility of runtime
dependencies to Element plugins via Element.dependencies(), since
public data would become safely stable and predictable.
Solution 2: Remove runtime dependencies from build input
--------------------------------------------------------
* This approach would be to ensure that Element plugins do not have
access to runtime dependencies.
* As such this proposal would not allow custom configuration of
runtime dependencies, because they would not be considered as
input anymore.
* Similarly, Element.dependencies() would no longer be allowed to
see runtime-only dependencies.
* The mutability of public data would no longer be a concern in
this regard, however I would still prefer to remove this
mutability in order to avoid plugins composing any complicated
data structures dynamically which might not be stable and might
affect reverse dependency builds in non-deterministic ways.
One thing to keep in mind with this approach, is that removing runtime
dependencies should not hinder later constructs such as test elements
possibly requiring sources from dependencies (like for "test" elements
or for running static code analyzers) - this would however force
projects to never communicate characteristics to reverse runtime-
dependencies via public data or explicit configuration.
I think this would not be a problem, any use cases I can think of
should be similarly solvable with variable settings and include files
in the project.
Therefore, I currently lean towards solution 2, and changing things
such that runtime dependencies become completely invisible to Element
implementations if that is possible.
Cheers,
-Tristan
[0]:
https://lists.apache.org/thread.html/rd9c23b849ead28f173fa3a1663de2788171a39cd8cdacb1f8c95c191%40%3Cdev.buildstream.apache.org%3E