Re: Proposal: Extending dependency attributes

Tristan Van Berkom Thu, 30 Jul 2020 02:46:27 -0700

Hi,

Thanks for participating :)

On Thu, 2020-07-30 at 00:07 +0200, Sander Striker wrote:
> Hi,
> 
> Took me a bit to get back to this, apologies.
> 
> TL;DR in general I find myself supportive, but I'm not convinced of the UI,
> which turns out quite verbose.  As a selfish motivation I'm also looking
> for ways to identify elements that do or do not match the same kind as the
> plugin.  If this is an implicit annotation that could work.

Regarding:

  "ways to identify elements that do or do not match the same kind as
   the plugin"

I think this pertains to the Bazel plugin issue you discuss below,
otherwise I'm not sure what you mean by this :)

> 
> More inline.
> 
> On Sun, Jul 26, 2020 at 9:14 AM Tristan Van Berkom <
> [email protected]> wrote:
> 
> > Hi all,
> > 
> >     This is a proposal for extending the `dependency` attributes on
> > element declarations such that elements can extend the dependency
> > attributes and participate in parsing them.
> > 
> > This is a proposal aimed at reconciling a few topics which we've been
> > looking into recently.
> > 
> >   * Plugins accessing private API[0]
> > 
> >     In Chandan's announcement we underlined the importance of being
> >     able to reduce redundancy in element declarations which stage
> >     elements into different sysroots.
> > 
> >     Any element which needs to parse an element name in it's
> >     configuration and later use Element.search() to find that element
> >     and treat it specially, is likely causing the user to enter
> >     redundant information by requiring the user to specify the element
> >     both as a dependency and also in the element configuration.
> > 
> 
> Indeed.  I'd like to be able to distinguish between types of dependencies,
> e.g. toolchain vs libraries, to be able to stage them in different
> locations.

Right, staging dependencies in custom locations is indeed one of the
possible applications of parsing custom YAML, and at this point the
most popular driving use case.

I could imagine plugins might want to conditionally load public data
from specific dependencies using custom YAML semantics, or maybe stage
partial artifacts by listing split domains here, or that future
extensions to the Element class might make custom YAML semantics
interesting for new and different reasons.

[...]
> >   * Replace Element.search() completely.
> > 
> >     The Element.search() API has been broken[2] for a very long time,
> >     as it does not support addressing elements across junction
> >     boundaries.
> > 
> >     I created this merge request[3] this weekend to address this, since
> >     we now have a clearer story about project relative element paths,
> >     however there are some problems with the patch.
> > 
> >     Namely, in order to make Element.search() performant and avoid
> >     redundantly reloading .bst files from the disk later on, we need to
> >     cache a lot of data which we usually discard from the loader,
> >     increasing our memory footprint by around 12%.
> > 
> >     Considering that we've been wanting to provide a solution that is
> >     more tightly coupled to the dependency lists, I think it is best if
> >     we just let Element.search() die completely, as any utility it
> >     provides can be replaced by this proposal.
> > 
> 
> Ideally there is no need for Element.search(), as it seems to mostly be
> used for supporting "depends"-like functionality outside of core depends.

Right, Element.search() is the current means by which an Element
implementation acquires a dependency Element by name/path.

It's hard for me to imagine any use case where the Element might want
to search for a dependency by it's element path unless it is explicitly
listed in the element declaration itself (e.g. "the .bst file").

Hypothetically an Element could search for elements listed on public
data scanned from other dependencies, but a need for this does not seem
likely.

> Proposed format changes
> > =======================
> > The Dependency dictionary[4] would gain a single new `config`
> > attribute, which would be a dictionary who's semantics would be defined
> > by the owning element.
> > 
> >   kind: foo
> > 
> >   build-depends:
> >   - filename: payload.bst
> >     config:
> >       #
> >       # The plugin loaded as the `foo` kind can
> >       # now be told anything specific about this
> >       # dependency `payload.bst`.
> >       #
> >       frob-me: True
> > 
> >   # Configuration that is parsed by the plugin
> >   # loaded as the `foo` kind in this project
> >   # is specified already on the toplevel `config`
> >   # dictionary.
> >   #
> >   config:
> >     some-config: "Yes please"
> > 
> 
> I'm not sure I follow.  Maybe this is more relatable with a less abstract
> example?

Right, I copied Valentin's old and more relatable example further down,
but would much rather focus on the abstract, in order to avoid getting
lost in specific use cases.

My aim with the abstract example is to drive the discussion in terms of
the abstract ability of parsing custom data, without driving the
implementation to serve any particular use case.

I think that presenting this in terms of:

   "An element can now parse custom semantics which define
    it's relationship to a dependency"

...is most practical and extensible.

This is also similar to UI builder formats like NextStep interface XML,
GtkBuilder XML (and probably QML as well which I'm less familiar with).

Each of these user interface description languanges need to have
semantics for a "Container" object to define properties which are
specific to it's relationship to a "Child Widget".

Our YAML format which creates an abstract dependency graph is very
similar to such user interface description languages already (and
admittedly more than a little inspired from those roots), as such I
feel like this abstract and generic approach to element <-> dependency
relationships would be a natural extension to our format.

> > Proposed plugin API changes
> > ===========================
> > The element would gain a new method for parsing dependency
> > configurations with a similar contract as Plugin.configure().
[...]
> >   * This new method would be optional to implement, however it will be
> >     mandatory if the user ever specifies a `config` dictionary for any
> >     of the given plugin instance's dependencies.
> > 
> >     This is to say that BuildStream core would take care of simply
> >     raising a nice LoadError informing the user that `config` is an
> >     invalid configuration for the said element, if the said element
> >     does not implement any special parsing of dependency
> >     configurations.
> > 
> 
> Why?  I see no reason for BuildStream core to raise concerns when users
> make anticipatory changes to an element definition to work with a (future)
> version of a plugin.  Or when a user switches an element from one kind to a
> similar kind that doesn't support these opaque configurations.  What am I
> missing?

Maybe I phrased this badly or failed to drive the point across, this is
intended to be a convenience (so that Element kinds do not *need* to
implement an additional method if they do not parse any custom YAML).

Everything in BuildStream YAML except for public data is strictly
validated, causing an early stage error if the user specifies an
unrecognized key, or specifies an invalid value for a recognized key,
anywhere.

If plugins fail to call Node.validate_keys() in Plugin.configure() for
any of the nodes it parses, this is a bug in the plugin, and one which
I think we can and definitely should also assert from the core.

This is a bit of an orthogonal conversation but I think we can and
should be storing the "validated" state of all loaded nodes, we should
be able to raise a useful error pointing out such Plugin bugs at the
time that Node._assert_fully_composited() is called, identifying faulty
plugins which failed to perform proper validation of all nodes.

Generally, I would much prefer investigating ways we could strengthen
validation of public data (possibly by introducing stricter "schemas"
defining public data, I don't know exactly how), than to consider
weakening validation anywhere else.

> > Example of Valentin's BuildElement proposal[1]
> > ==============================================
> > I thought it would be good to include a small example of what the
> > BuildElement sysroot example might look like in YAML, based on the
> > `readline.bst` example in his original email:
> > 
> >   depends:
> >   - filename: glibc.bst
> >     config:
> >       sysroot: "%{sysroot}"
> >   - filename: ncurses.bst
> >     config:
> >       sysroot: "%{sysroot}"
> >   - filename: system-image.bst
> >     type: build
> >   - filename: sysroot-compiler.bst
> >     type: build
> > 
> > In this example, the BuildElement class would implement
> > Element.configure_dependencies() in such a way as to collect a list of
> > all of the dependencies which are supposed to go in specific sysroots,
> > and stage those in those sysroots instead of `/`.
> > 
> 
> I find the example a bit confusing, as what does it mean to have a
> different sysroot config on a runtime dependency?  Does that only affect
> these dependencies in their capacity as build-depends?  What is the
> expected effect here?

There are no runtime dependencies in the above example, all of them are
required at build time (although `glibc.bst` and `ncurses.bst` are
*also* runtime dependencies).

Your question is still interesting, if we were to implement Valentin's
proposal for BuildElement then this should be well documented to only
be relevant for build dependencies.

Regarding runtime-only dependencies, there is one important thing to
underline: The only entity which makes any decision about where a
dependency gets staged, is the Element which is building something.

There is no way for BuildStream Elements to decide where they will be
staged in reverse dependency builds, other semantics can be used for
this (I think we should support simple file relocations in elements
like `filter` and `compose` for instance).

> Other than that, it's pretty verbose.  I was envisioning something like:
>   build-depends:
>   - liba.bst
>   - compiler.bst [toolchain]
>   - platform.bst [platform]
> 
> And have these elements essentially get a list of tags associated, so that
> one has a way to filter based on tags present/absent when iterating (or
> have a specialized function for it).
> 
> Now, maybe something similar can be achieved if we would allow something
> like the following, which avoids duplication of the repetitive verbose
> lines:
>   depends:
>   - filenames:
>     - glibc.bst
>     - ncurses.bst
>     config:
>       sysroot: "%{sysroot}"
>   - filenames:
>     - system-image.bst
>     - sysroot-compiler.bst
>     type: build

+1

Making dependency declarations less verbose is desirable and I favor
this general grouping approach for the purpose of declaring
dependencies which have common properties.

Implementation-wise, I think this can be done completely separately and
independently from the proposal at hand.

> I've sadly stalled on progress on a bazel plugin[5] that I started for
> research/discovery reasons as outlined in Chandan's summary message[0].
> However, even in its current very WIP form, it raises some questions on how
> to do certain things.  Like filter all dependencies of the same kind, to
> stage them differently.  I imagine that some of this work would end up
> being split between stage() and configure_dependencies() following your
> proposal?

Filtering elements based on the same "kind" is a misguided approach, as
the element "kind" is not indicative of the nature of the actual plugin
being used.

This has come up several times since, and has been pointed out on the
infamous issue regarding plugins accessing unstable plugin API:

    https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/2

I.e. you cannot assume which plugin implements a given "kind", only the
declaring project knows what it used for a given "kind".

Interestingly, Ben and I have discussed this fairly recently on IRC and
we do agree that it would be useful for plugins to have some kind of
identity encoded into them which could be used for this kind of
conditional treatment of dependencies. I imagine a kind of domain name
format might be suitable for unique plugin identities, e.g.:

   "org.buildstream.git" <-- indicates the "git" plugin maintained by 
BuildStream

That said, I imagine that for the Bazel use case you describe, an
identity filtering would be more user friendly, I'm not certain of all
the details but I think you really want to filter dependencies based on
a specific plugin identity rather than requiring the user to explicitly
decide which dependencies get treated a certain way.

If this is the case, and imagining that we already have something like
"Plugin.get_identity()", then we could achieve this kind of filtering
with a simple list comprehension:

    class Bazel(Element):

        ...

        # To be defined: probably a plugin identity would be encoded
        # directly into the plugin class data.
        #
        BST_PLUGIN_IDENTITY = "com.google.bazel"

        ...

        def configure(self, node):

            ...

            # Collect the bazel dependencies
            #
            self.bazel_dependencies = [
                element
                for element in self.dependencies(Scope.BUILD)
                if element.get_identity() == "com.google.bazel"
            ]

> Implementation plan
> > ===================
> > To implement this change I would go through 3 phases of implementation.
> > 
> > 
> >   * Implement the new format and method
> > 
> >     This would be done in the BuildStream core repository and
> >     would introduce bare bones plugins in the `tests/` directory
> >     for the purpose of testing the correct behaviors of the
> >     feature.
> > 
> >     At the same time, this would result in deprecation of
> >     Element.search().
> > 
> >   * Migrate existing plugins
> > 
> >     In this phase, we would update the ScriptElement, base
> >     class, the `script` convenience element, and any plugins
> >     in `bst-plugins-experimental` which use `Element.search()`
> >     to use the new `Element.configure_dependencies()` instead.
> > 
> 
> To support layout, right?

Yes correct.

I'm not sure how x86Image element is currently implemented (whether it
derives from ScriptElement or not), but maybe we would implement that
one without inheriting from `ScriptElement` this time around, as it
doesn't really need to be that complicated.

Cheers,
    -Tristan

Re: Proposal: Extending dependency attributes

Reply via email to