[GitHub] [buildstream] AdrianVovk commented on issue #1851: bst2: Tracking is broken

via GitHub Mon, 07 Aug 2023 21:48:20 -0700


AdrianVovk commented on issue #1851:
URL: https://github.com/apache/buildstream/issues/1851#issuecomment-1668901834

> Ok so at a glance, I can see some pretty bad problems with your plugins...
looking right now at
https://gitlab.com/carbonOS/build-meta/-/blob/35f91b6d/plugins/gen-cargo-lock.py

I'm sorry, but you looked at the wrong plugin!! `gen-cargo-lock` is a very
special case, as I'll explain below. My other plugins (`gnu` and `go-vendor`)
should be a lot more normally implemented

Also, I didn't write this one! I got it from here:
https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/223

This is part of my bst2 port. The bst1 version of the `gen-cargo-lock`
plugin that I wrote _did_ actually store `Cargo.lock` in `project.refs` for
exactly the reasons you detail. However, when trying to figure out how to port
my plugins to `bst2`, I came across the existing plugin I linked above and
realized that's a much cleaner solution.

> Your source does not properly implement load_ref(), get_ref(), set_ref(),
get_unique_key(), is_cached() or is_resolved()

This is because it doesn't need to. `gen-cargo-lock` is a source that is
always immediately followed by a `cargo` source. The `Cargo.lock` file this
plugin generates is used for exactly one purpose: by the `cargo` source while
tracking. All the data contained in the `Cargo.lock` file gets turned into
buildstream refs by the `cargo` source and put into `project.refs`, so there's
no need for the `gen-cargo-lock` plugin to have a unique ID. At build time,
`cargo` has all of its dependencies vendored and so it doesn't need to look at
`Cargo.lock`. See [here in my
`project.refs`](https://gitlab.com/carbonOS/build-meta/-/blob/6f57dc2e/project.refs#L732-1020)
for an example of this

Here's the various steps:
- During tracking, the `cargo` plugin needs a `Cargo.lock` file. Some
projects don't ship one. So, we stage a `gen-cargo-lock` element to generate
it. Then the `cargo` plugin executes, reads the `Cargo.lock` file we just
generated, and turns it into buildstream refs for all the dependencies. The
unique-id is calculated from the ref the `cargo` plugin generated, which will
be the real source of truth when it comes time to stage the sources for the
build
- During fetch, we already have all the version numbers and hashes of all
dependencies tracked under the `cargo` plugin's ref. So, we download those
dependencies. This is done by the `cargo` source. `Cargo.lock` is redundant here
- During stage, it's the same as fetch. We have all the data available, so
we set up a special vendoring directory and populate it with all the libraries.
Again, all done by the `cargo` source. `Cargo.lock` is redundant here

If it doesn't work for some reason (i.e. if the toolchain in the sandbox
really does need `Cargo.lock` around during the build) I'll serialize the file
into a ref like I did before and like you describe.

> but I think a fundamental understanding needs to be explained somewhere in
the documentation to make plugin authoring more intuitive, I'm not sure where
in the docs or how best to explain this.

I absolutely agree. `bst`'s plugin architecture definitely felt
confusing/overwhelming when I first started writing plugins. And I made lots of
the mistakes you detail here. I fixed them over time as people tried to build
my project in their own clean environment and failed, and that taught me how
the system works. But it'd be nice if it could be documented enough to get it
right the first time :)

A lot of what you wrote here, especially the "Overall reproducibility
expectations" section, would be useful in the docs. Otherwise the plugin
architecture feels a bit impenetrable. Some extra docs regarding URL
translation (when should it be done? how? what does it mean to mark a URL? why
does this need to be done?) would be nice too; that stuff still confuses me...

It can probably just be top-level documentation for the `Source` class
itself (i.e.
[here](https://docs.buildstream.build/2.0/buildstream.source.html)).
Worst-case, you can add to the Additional Writings section

> As a side note, every source which uses gen-cargo-lock in your project
shares the same Cargo.lock file, because you have put it directly at the root
of the mirror directory owned by the gen-cargo-lock source kind.

Hmm so this might be a problem if multiple tracks are happening in parallel.
Again, we only need to ferry the data from one source to the very next source,
and then that one will turn it into an actual ref. My concern here is that if
we have multiple cargo projects being tracked in parallel, they share the one
file and trample each-other. Somes like I just need to incorporate the project
and element name into the local cache name to avoid a race condition

> Consequently, your plugin risks introducing different data for the same
cache key, risking the following scenario:

This might be an issue if the build chain ends up looking at the
`Cargo.lock` file anyway, despite not having a need to since all dependencies
are already vendored.

It would be nice if I could tell, from `stage`, if I'm staging for a track
or for a build. If it's for a track I'd stage `Cargo.lock`, but if it's for a
build I'd stage nothing to ensure consistency between builds

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [buildstream] AdrianVovk commented on issue #1851: bst2: Tracking is broken

Reply via email to