Hi,
On Mon, Aug 17, 2020 at 12:36 PM Tristan Van Berkom <
[email protected]> wrote:
> Hi Sander,
>
> Thanks for the quick reply.
>
> While writing this reply, another point occurred to me which
> complicates matters:
>
> For a given build graph with junctions, an "%{element-name}"
> and "%{project-name}" is not guaranteed to be unique.
>
> This is because the same project can be loaded more than
> once for different purposes, possibly with different versions
> or project options (or possibly with the same options/versions).
>
> This complicates things somewhat, putting me on the fence regarding how
> strong my position should be in favor of removing these.
>
:)
> While I don't really feel strongly about this either way, I did have
> some disagreement which I felt compelled to reply to inline, so I'll
> leave those in place, and try to summarize at the end of this mail...
>
I've copied that bit back up here for those that TL;DR:
> I think that if we are going to keep the 'element-name' and
> 'project-name' in existence, I would be happy with:
* Let's keep 'project-name' and 'element-name' as default components
> of the 'build-root'.
> * Let's make sure that 'build-root' is indeed correctly considered in
> the cache key.
> * I think at this point, the likelyhood of build avoidance on
> element or project renames becomes very low, so I think we could
> do without solving #1034 entirely and just mark it as WONTFIX.
> How about this course of action ?
+1.
I'll respond to your concerns in line.
On Mon, 2020-08-17 at 00:27 +0200, Sander Striker wrote:
> > Hi Tristan,
> >
> > On Sat, Aug 15, 2020 at 4:22 PM Tristan Van Berkom <
> > [email protected]> wrote:
>
[...]
> > > Renaming of elements and projects affect cache keys
> > > https://gitlab.com/BuildStream/buildstream/-/issues/1034
> > >
> > > At the same time, I would like to establish consensus that the
> > > alternative to removing the "%{element-name}" and "%{project-name}"
> > > automatically set variables, is to simply say that !1034 is not a bug,
> > > and to resolve it with WONTFIX.
>
[...]
> > In other words, I think I'm coming around to a Won't Fix. Mostly
> because I
> > think we made the decision for a path per element for a reason; and I
> don't
> > think that reason no longer applies.
>
> I don't agree here.
>
> I don't think we had a strong reason to do it *this way* more than
> doing it another way, I think we mostly thought this was a simple and
> nice way of doing it.
>
> I think the main thing we accomplished by doing it this way was:
>
> * Avoid having to write code which explicitly stages cached build
> directories harmoniously into a `bst shell` environment.
>
> This could be done even if it were not the original build directory
> where the source was built in `bst build`.
>
> * Avoid having to inform debuggers inside a `bst shell` environment
> of where to look for sources.
>
> This is because the build directory was indeed affecting the
> binary output in debug mode (by leaving a pointer to the build
> directory for a debugger to look for it's sources in).
>
> The main point I think we disagree on is how "difficult" it might be to
> stage sources into separate directories, even if their default in
> normal circumstances is to build these into a hard coded
> "/buildstream/build" directory.
>
And maybe I am missing something here. My thinking is mostly to provide
tools with the files where they expect them to be - which I understand you
think is less difficult than it seems.
> > Also, selfishly, I'd like to stage source for multiple elements at the
> same time*,
> > for the bazel plugin research discovery[1].
>
> And I think your link proves my point above:
>
> for element in source_elements:
> # NOTE: we assume that %build-root is different for each element
> element_dir = element.get_variable("build-root")
> element.stage_sources(sandbox, element_dir)
>
> As the plugin author you have full control of where you stage things,
> the above would read perfectly fine as:
>
> for element in source_elements:
> element_dir = os.path.join("/buildstream", element.normal_name)
> element.stage_sources(sandbox, element_dir)
>
True.
In general: I refute that the presence of the "%{element-name}" and
> "%{project-name}" variables are really providing much value, or ever
> were.
>
That is a fair point.
>
> [...offtopic...]
>
> Elements are not allowed to depend on their dependency's sources being
> present, this seems to be a feature increasingly in demand.
>
We should probably start with documenting the expectations in the plugin
code.
> We are currently writing a license scanner tool for BuildStream
> projects[0] which could be written as a plugin in BuildStream 2 instead
> of a script which calls BuildStream, if we had the ability to depend on
> sources of dependencies.
>
Indeed. In an informal conversation Benjamin offered that an alternate
path might be a generated "identity" element (just copy the source) by a
composite element... this would of course require the "composite element"
feature :)
[.../offtopic...]
>
[...]
> > Right. And the %{element-name} variable is not dropping
> the".bst"-suffix if
> > present at the moment :). I wanted to propose a change for that [2].
> > <
> https://gitlab.com/BuildStream/buildstream/-/commit/5395789b01761022cc4653574ebe517c45c7e260
> >
>
> I would prefer not, but believe this to be purely a matter of
> preference.
>
> An element name is addressed as "foo.bst", which is also a perfectly
> fine directory name, not sure what is wrong with
> "/buildstream/build/project/foo.bst" as a directory name, as it more
> directly corresponds with the original element name.
>
I'm totally with you from a correctness perspective. However, directory
names are usually more closely related to the basename. If I downloaded,
say, openssl, I would put it in ~/work/openssl/ and build it there, not in
~/work/openssl.bst/. Given that some build systems actually assign meaning
to the directory name, dropping the .bst is somewhat desirable.
I know that the .bst has no meaning, and that all elements could be named
not having the .bst. Then again, the .bst is helpful in file type
hinting. Also our examples tend to follow this convention.
Anyway, this is related but orthogonal to your proposed changes.
[...]
> > > The path where sources were staged and the build was performed, has a
> > > high chance of affecting the artifact output, and as such it should
> > > certainly be considered in the artifact cache key.
> > >
> >
> > Not sure about a high chance: it's very uncommon for software to always
> > be built from the same base directory, ie. most software is downloaded,
> > unpacked, then built from someone's home directory. And those tend to
> > be unique-ish.
> > That said, I'll grant you that there is a chance. Less-well behaved
> > software may make assumptions about where it needs to be built from.
> For such
> > elements you would likely have to very deliberately and specifically set
> the
> > build-root.
>
> I recall that one issue was that compilers with debugging enabled like
> to encode this directory into binary output for the sake of debuggers
> finding the source directory at debugging time, this constitutes a
> fairly high probability of binary variance I think.
>
Touché. Good point.
> In any case, the build directory is an input we are in control of, and
> even if the variance is small, I think it's our duty to ensure we
> capture it in the cache key.
>
+1.
[...]
> Ok so let's try to summarize with everything in perspective:
>
>
> * The element-name/project-name together is not guaranteed to be
> unique for a given pipeline.
>
> It is only possible to stage duplicates with explicitly set
> separate directories anyway, and the chances that these end
> up colliding in, say, a `bst shell`, are highly unlikely.
>
> * The work to fix #1034 is non-negligible, here is my understanding:
>
> - The weak keys need to keep element names in them, in order to
> record the "logical shape" of an element's dependencies, which
> should cause an element to rebuild if that shape changes even
> in non-strict mode.
>
> - Currently "strong keys" are built on top of "weak keys", for no
> particular reason other than it makes sense and seems elegant.
>
> This part needs to change in order to address #1034
>
> - For some reason which is beyond me, cache keys are deeply
> entangled in this thing called an "ArtifactProto".
>
Can you expand a bit on what you mean here?
> My understanding is that changing the strong keys to not
> be based on the weak keys, implies a lightweight additional
> round-trip to the remote asset service.
>
> This needs to change to address #1034
>
>
> I think that if we are going to keep the 'element-name' and
> 'project-name' in existence, I would be happy with:
>
> * Let's keep 'project-name' and 'element-name' as default components
> of the 'build-root'.
>
> * Let's make sure that 'build-root' is indeed correctly considered in
> the cache key.
>
> * I think at this point, the likelyhood of build avoidance on
> element or project renames becomes very low, so I think we could
> do without solving #1034 entirely and just mark it as WONTFIX.
>
> How about this course of action ?
>
Cheers,
Sander
> Cheers,
> -Tristan
>
> [0]:
> https://lists.freedesktop.org/archives/freedesktop-sdk/2020-August/000054.html
>