Re: Proposal: Remove %{element-name} and %{project-name} variables, hard code %{build-root} again

Tristan Van Berkom Mon, 17 Aug 2020 03:36:58 -0700

Hi Sander,

Thanks for the quick reply.

While writing this reply, another point occurred to me which
complicates matters:

  For a given build graph with junctions, an "%{element-name}"
  and "%{project-name}" is not guaranteed to be unique.

  This is because the same project can be loaded more than
  once for different purposes, possibly with different versions
  or project options (or possibly with the same options/versions).

This complicates things somewhat, putting me on the fence regarding how
strong my position should be in favor of removing these.

While I don't really feel strongly about this either way, I did have
some disagreement which I felt compelled to reply to inline, so I'll
leave those in place, and try to summarize at the end of this mail...

On Mon, 2020-08-17 at 00:27 +0200, Sander Striker wrote:
> Hi Tristan,
> 
> On Sat, Aug 15, 2020 at 4:22 PM Tristan Van Berkom <
> [email protected]> wrote:
> 
> > Hi all,
> > 
> > As a quick strawman proposal, I would like to propose that we remove
> > the automatically set "%{project-name}" and "%{element-name}"
> > variables, in the name of making progress towards resolving this issue
> > which is on our 2.0 blocker list:
> > 
> 
> TL;DR, I'm not a big fan of this proposal, as I think the %{element-name} is
> likely used beyond %{build-root}.  Furthermore, I think I've actually come to
> like the unique build root :).

Agree on "niceness" factor.

> >   Renaming of elements and projects affect cache keys
> >   https://gitlab.com/BuildStream/buildstream/-/issues/1034
> > 
> > At the same time, I would like to establish consensus that the
> > alternative to removing the "%{element-name}" and "%{project-name}"
> > automatically set variables, is to simply say that !1034 is not a bug,
> > and to resolve it with WONTFIX.
> > 
> 
> I think this is a matter of setting a non-default %{build-root} which would
> not
> contain %{project-name} nor %{element-name}.  If a project defines this as a
> fixed path then element-name and project-name do not affect the cache key.
> 
> In other words, I think I'm coming around to a Won't Fix.  Mostly because I
> think we made the decision for a path per element for a reason; and I don't
> think that reason no longer applies.

I don't agree here.

I don't think we had a strong reason to do it *this way* more than
doing it another way, I think we mostly thought this was a simple and
nice way of doing it.

I think the main thing we accomplished by doing it this way was:

  * Avoid having to write code which explicitly stages cached build
    directories harmoniously into a `bst shell` environment.

    This could be done even if it were not the original build directory
    where the source was built in `bst build`.

  * Avoid having to inform debuggers inside a `bst shell` environment
    of where to look for sources.

    This is because the build directory was indeed affecting the
    binary output in debug mode (by leaving a pointer to the build
    directory for a debugger to look for it's sources in).

The main point I think we disagree on is how "difficult" it might be to
stage sources into separate directories, even if their default in
normal circumstances is to build these into a hard coded
"/buildstream/build" directory.

> Also, selfishly, I'd like to stage source for multiple elements at the same 
> time*,
> for the bazel plugin research discovery[1].

And I think your link proves my point above:

    for element in source_elements:
        # NOTE: we assume that %build-root is different for each element
        element_dir = element.get_variable("build-root")
        element.stage_sources(sandbox, element_dir)

As the plugin author you have full control of where you stage things,
the above would read perfectly fine as:

    for element in source_elements:
        element_dir = os.path.join("/buildstream", element.normal_name)
        element.stage_sources(sandbox, element_dir)

In general: I refute that the presence of the "%{element-name}" and
"%{project-name}" variables are really providing much value, or ever
were.

[...offtopic...]

Elements are not allowed to depend on their dependency's sources being
present, this seems to be a feature increasingly in demand.

We are currently writing a license scanner tool for BuildStream
projects[0] which could be written as a plugin in BuildStream 2 instead
of a script which calls BuildStream, if we had the ability to depend on
sources of dependencies.

[.../offtopic...]

> > Here is some history and some facts to explain why.
> > 
> > 
> > History
> > ~~~~~~~
> > I have a vague recollection of a BeaverCon in manchester where we had a
> > focus on IDE integration and debugging, and one of the hot topics was
> > the ability to have your debuggers be able to trace back built binaries
> > to their sources without any hassle.
> > 
> > I.e. when compiling with debugging flags enabled, there is a high
> > likelyhood that the resulting binaries would end up having their build
> > directories hard coded into them.
> > 
> > Running debuggers is otherwise a bit of a hassle, as one needs to
> > inform the debugger of where the sources are for each library, if they
> > are not their in their original location.
> > 
> 
> I think we actually had PoC VS Code with debugging integration going at the
> time.
> This did involve workspace and non-workspace element builds.

Yes, and I think we may have ended up defaulting to `bst shell` having
cached build directories staged by default for any element which had a
workspace open on.

Coupling this with per-element build directories (without having bst
shell explicitly stage build directories into separate directories), we
have a `bst shell` that is hopefully convenient for working with build
directories.

There was yet another whole discussion about bringing in an extra set
of debugging elements to stage for a `bst shell` session, this proposal
was lost in the cracks and I think we can safely set that aside until
some post BuildStream 2.0 day that someone wants to pick it up again.

> > Action: What we ended up doing is adding the "%{element-name}" and the
> > "%{project-name}" variables which are automatically resolved by
> > BuildStream, and using these to resolve the "%{build-root}" variable
> > which is declared by the core project.conf and used by virtually every
> > element (certainly every BuildElement).
> > 
> 
> Right.  And the %{element-name} variable is not dropping the".bst"-suffix if
> present at the moment :).  I wanted to propose a change for that [2].
> <https://gitlab.com/BuildStream/buildstream/-/commit/5395789b01761022cc4653574ebe517c45c7e260>

I would prefer not, but believe this to be purely a matter of
preference.

An element name is addressed as "foo.bst", which is also a perfectly
fine directory name, not sure what is wrong with
"/buildstream/build/project/foo.bst" as a directory name, as it more
directly corresponds with the original element name.

> > Somewhat orthogonal but related to this, is we set out to implement
> > caching of build trees, which together would get us much closer to good
> > debugging environment (we would have the ability to have a `bst shell`
> > environment where multiple build trees are staged and we can run a
> > debugger, and everything "just works").
> > 
> > Great.
> > 
> 
> :).
> 
> Consequences
> > ~~~~~~~~~~~~
> > The fact that we have "%{build-root}" defined to something that is
> > unavoidably unique per element name/project name, means that the
> > resulting artifact cache key needs to be different.
> > 
> 
> Yes.
> 
> 
> > It might be that we have a bug where BuildElement *fails* to consider
> > the "%{build-root}" in the cache key, but I think this is an
> > unacceptable path to go down.
> > 
> > BuildStream strives to produce reproducible artifacts, as such the
> > cache key must capture anything which has a chance of affecting the
> > binary output in an artifact.
> > 
> > The path where sources were staged and the build was performed, has a
> > high chance of affecting the artifact output, and as such it should
> > certainly be considered in the artifact cache key.
> > 
> 
> Not sure about a high chance: it's very uncommon for software to always
> be built from the same base directory, ie. most software is downloaded,
> unpacked, then built from someone's home directory.  And those tend to
> be unique-ish.
> That said, I'll grant you that there is a chance.  Less-well behaved
> software may make assumptions about where it needs to be built from.  For such
> elements you would likely have to very deliberately and specifically set the
> build-root.

I recall that one issue was that compilers with debugging enabled like
to encode this directory into binary output for the sake of debuggers
finding the source directory at debugging time, this constitutes a
fairly high probability of binary variance I think.

In any case, the build directory is an input we are in control of, and
even if the variance is small, I think it's our duty to ensure we
capture it in the cache key.

[...]
> > I.e. it would not "just work", but if one were to actually sit down and
> > work on the IDE/debugging UX, I don't think having a single hard coded
> > build directory for all elements at build time would really be an
> > obstacle.
> > 
> 
> I think it will require [a lot] more integration work that way, based on that
> we already had someone go down that path in the past and ended up here.
> 
> Given that renames may happen but are likely not that common, I think
> optimizing for it is not that necessary.  Especially not, if the defaults
> can be overridden to get the desired behavior for this use case.  Which may
> in turn trade-off IDE integration.

Ok so let's try to summarize with everything in perspective:

  * The element-name/project-name together is not guaranteed to be
    unique for a given pipeline.

    It is only possible to stage duplicates with explicitly set
    separate directories anyway, and the chances that these end
    up colliding in, say, a `bst shell`, are highly unlikely.

  * The work to fix #1034 is non-negligible, here is my understanding:

    - The weak keys need to keep element names in them, in order to
      record the "logical shape" of an element's dependencies, which
      should cause an element to rebuild if that shape changes even
      in non-strict mode.

    - Currently "strong keys" are built on top of "weak keys", for no
      particular reason other than it makes sense and seems elegant.

      This part needs to change in order to address #1034

    - For some reason which is beyond me, cache keys are deeply
      entangled in this thing called an "ArtifactProto".

      My understanding is that changing the strong keys to not
      be based on the weak keys, implies a lightweight additional
      round-trip to the remote asset service.

      This needs to change to address #1034

I think that if we are going to keep the 'element-name' and
'project-name' in existence, I would be happy with:

  * Let's keep 'project-name' and 'element-name' as default components
    of the 'build-root'.

  * Let's make sure that 'build-root' is indeed correctly considered in
    the cache key.

  * I think at this point, the likelyhood of build avoidance on
    element or project renames becomes very low, so I think we could
    do without solving #1034 entirely and just mark it as WONTFIX.

How about this course of action ?

Cheers,
    -Tristan

[0]: 
https://lists.freedesktop.org/archives/freedesktop-sdk/2020-August/000054.html

Re: Proposal: Remove %{element-name} and %{project-name} variables, hard code %{build-root} again

Reply via email to