Hi Tristan, On Sat, Nov 7, 2020 at 4:00 PM Tristan Van Berkom < [email protected]> wrote:
> I wrote up a reply to this today but trashed it, needed a bit more time. > :-). > But since you raise this I hope to stop the train before it leaves the > station... > I wanted to be clever but coming up blank with train analogies... :-) That said, please take the below as serious questions, not as a troll. > [...] > > Maybe a more dynamic way is to have functions like in Element: > > > > def requires_artifact_metadata_for_cachekey_calculation(self), > > return [] # list of elements the artifact metadata is required for > > > > That could make the pipeline slightly more complicated, but it wouldn't > > come at the expense of correctness not performance I think. > > Cache keys are derived from project input _only_, we never require the > output of an artifact in any way as input to a cache key. > > Cache keys can be computed immediately as soon as the project (and > subprojects) are loaded, and before ever needing to download any sources or > build anything. > I know that is how we have been doing things - I don't think it's a bad idea to reiterate the why every once in a while including what the benefit is to the user. > Putting aside that I would really not want to change the above, I don’t > even think we rationally can. > > We need to know cache keys early on of course in order to download cached > artifacts, and we need the *target* cache keys first, in order to know if > we even need to build build dependencies. > You are correct that we would then end up needing to "unconditionally" pull down artifact _metadata_ for those of which data is used in cache key calculation. And this metadata would already be present if you've built you project once, or when you've done a build once of an element that would trigger a download of the metadata. After that beyond loading the project data, we'll be loading some artifact metadata. I don’t think there is any way that we can have build results ever > contribute to cache keys, and I would hope to steer away from changes to > this. > I think I understand the desire, but I don't think I fully understand or appreciate the user value. If you could help me understand that would be great. > There is certainly more thought needed here if we need element/python > level conditionals based on built artifacts, but for tonight my thoughts > are: > > * I’m not convinced we do need such conditionals in the plugin (such > conditionals can happen in the sandbox) > This is the point where I am still torn. The part of "such conditionals can happen in the sandbox" specifically. It's just not so trivial as it sounds - and if we want to keep things relatively cohesive to plugin authors, it basically advocates for python in the sandbox as well. And then it's still questionable how to get the required data into the sandbox, without the introduction of additional elements in the project graph. I am wondering if we are weighing a self-imposed limitation against user complexity. > * Any such conditionals cannot affect the cache key > If such conditionals were done outside the sandbox, they would have to as per the start of the thread. > Cheers, > -Tristan Cheers, Sander > > > > > > Thoughts? > > > > Cheers, > > > > Sander > > > > > >> While we lose the idea that the python code of an element can do things > >>> differently based on previously built artifacts, we can still of course > >>> have elements generate sandbox commands which will behave differently > >>> depending on what has already been built (which is inevitably true > >>> anyway, build scripts will generally behave differently depending on > >>> what is already staged). > >>> > >>> This means that any interaction between elements and data which needs > >>> to be "carried forward" so to speak, *must* absolutely be done within > >>> the sandbox, which will certainly affect some elements, like dpkg_build > >>> and dpkg_deploy for instance. These plugins will need to be changed > >>> such that they no longer attempt to carry data over through mutable > >>> public data, but instead they should place any data they need to carry > >>> forward into a well known location in the resulting artifact, so that > >>> reverse dependencies can work with that later on. > >>> > >>> > >>> Any thoughts on this proposal ? > >>> > >> > >> A little later that I desired, but hopefully clear enough to take into > >> account. > >> > >> > >>> Cheers, > >>> -Tristan > >>> > >> > >> Cheers, > >> > >> Sander > >> > >> > >>> > >>> > >>> [0]: > >>> > https://lists.apache.org/thread.html/re3055975198fa5115b7ce3f533e58518443976b6ae2b13c3be15c881%40%3Cdev.buildstream.apache.org%3E > >>> [1]: https://gitlab.com/BuildStream/buildstream/-/issues/1293 > >>> > >>> > >>> >
