Hi Tristan,

On Sat, Nov 7, 2020 at 4:00 PM Tristan Van Berkom <
[email protected]> wrote:

> I wrote up a reply to this today but trashed it, needed a bit more time.
>

:-).


> But since you raise this I hope to stop the train before it leaves the
> station...
>

I wanted to be clever but coming up blank with train analogies... :-)  That
said, please take the below as serious questions, not as a troll.


> [...]
> > Maybe a more dynamic way is to have functions like in Element:
> >
> > def requires_artifact_metadata_for_cachekey_calculation(self),
> >  return [] # list of elements the artifact metadata is required for
> >
> > That could make the pipeline slightly more complicated, but it wouldn't
> > come at the expense of correctness not performance I think.
>
> Cache keys are derived from project input _only_, we never require the
> output of an artifact in any way as input to a cache key.
>
> Cache keys can be computed immediately as soon as the project (and
> subprojects) are loaded, and before ever needing to download any sources or
> build anything.
>

I know that is how we have been doing things - I don't think it's a bad
idea to reiterate the why every once in a while including what the benefit
is to the user.


> Putting aside that I would really not want to change the above, I don’t
> even think we rationally can.
>
> We need to know cache keys early on of course in order to download cached
> artifacts, and we need the *target* cache keys first, in order to know if
> we even need to build build dependencies.
>

You are correct that we would then end up needing to "unconditionally" pull
down artifact _metadata_ for those of which data is used in cache key
calculation.  And this metadata would already be present if you've built
you project once, or when you've done a build once of an element that would
trigger a download of the metadata.  After that beyond loading the project
data, we'll be loading some artifact metadata.

I don’t think there is any way that we can have build results ever
> contribute to cache keys, and I would hope to steer away from changes to
> this.
>

I think I understand the desire, but I don't think I fully understand or
appreciate the user value.  If you could help me understand that would be
great.


> There is certainly more thought needed here if we need element/python
> level conditionals based on built artifacts, but for tonight my thoughts
> are:
>
> * I’m not convinced we do need such conditionals in the plugin (such
> conditionals can happen in the sandbox)
>

This is the point where I am still torn.  The part of "such conditionals
can happen in the sandbox" specifically.  It's just not so trivial as it
sounds - and if we want to keep things relatively cohesive to plugin
authors, it basically advocates for python in the sandbox as well.  And
then it's still questionable how to get the required data into the sandbox,
without the introduction of additional elements in the project graph.  I am
wondering if we are weighing a self-imposed limitation against user
complexity.


> * Any such conditionals cannot affect the cache key
>

If such conditionals were done outside the sandbox, they would have to as
per the start of the thread.


> Cheers,
>     -Tristan


Cheers,

Sander


>
>
> >
> > Thoughts?
> >
> > Cheers,
> >
> > Sander
> >
> >
> >> While we lose the idea that the python code of an element can do things
> >>> differently based on previously built artifacts, we can still of course
> >>> have elements generate sandbox commands which will behave differently
> >>> depending on what has already been built (which is inevitably true
> >>> anyway, build scripts will generally behave differently depending on
> >>> what is already staged).
> >>>
> >>> This means that any interaction between elements and data which needs
> >>> to be "carried forward" so to speak, *must* absolutely be done within
> >>> the sandbox, which will certainly affect some elements, like dpkg_build
> >>> and dpkg_deploy for instance. These plugins will need to be changed
> >>> such that they no longer attempt to carry data over through mutable
> >>> public data, but instead they should place any data they need to carry
> >>> forward into a well known location in the resulting artifact, so that
> >>> reverse dependencies can work with that later on.
> >>>
> >>>
> >>> Any thoughts on this proposal ?
> >>>
> >>
> >> A little later that I desired, but hopefully clear enough to take into
> >> account.
> >>
> >>
> >>> Cheers,
> >>>    -Tristan
> >>>
> >>
> >> Cheers,
> >>
> >> Sander
> >>
> >>
> >>>
> >>>
> >>> [0]:
> >>>
> https://lists.apache.org/thread.html/re3055975198fa5115b7ce3f533e58518443976b6ae2b13c3be15c881%40%3Cdev.buildstream.apache.org%3E
> >>> [1]: https://gitlab.com/BuildStream/buildstream/-/issues/1293
> >>>
> >>>
> >>>
>

Reply via email to