AdrianVovk commented on issue #1851: URL: https://github.com/apache/buildstream/issues/1851#issuecomment-1668901834
> Ok so at a glance, I can see some pretty bad problems with your plugins... looking right now at https://gitlab.com/carbonOS/build-meta/-/blob/35f91b6d/plugins/gen-cargo-lock.py I'm sorry, but you looked at the wrong plugin!! `gen-cargo-lock` is a very special case, as I'll explain below. My other plugins (`gnu` and `go-vendor`) should be a lot more normally implemented Also, I didn't write this one! I got it from here: https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/223 This is part of my bst2 port. The bst1 version of the `gen-cargo-lock` plugin that I wrote _did_ actually store `Cargo.lock` in `project.refs` for exactly the reasons you detail. However, when trying to figure out how to port my plugins to `bst2`, I came across the existing plugin I linked above and realized that's a much cleaner solution. > Your source does not properly implement load_ref(), get_ref(), set_ref(), get_unique_key(), is_cached() or is_resolved() This is because it doesn't need to. `gen-cargo-lock` is a source that is always immediately followed by a `cargo` source. The `Cargo.lock` file this plugin generates is used for exactly one purpose: by the `cargo` source while tracking. All the data contained in the `Cargo.lock` file gets turned into buildstream refs by the `cargo` source and put into `project.refs`, so there's no need for the `gen-cargo-lock` plugin to have a unique ID. At build time, `cargo` has all of its dependencies vendored and so it doesn't need to look at `Cargo.lock`. See [here in my `project.refs`](https://gitlab.com/carbonOS/build-meta/-/blob/6f57dc2e/project.refs#L732-1020) for an example of this Here's the various steps: - During tracking, the `cargo` plugin needs a `Cargo.lock` file. Some projects don't ship one. So, we stage a `gen-cargo-lock` element to generate it. Then the `cargo` plugin executes, reads the `Cargo.lock` file we just generated, and turns it into buildstream refs for all the dependencies. The unique-id is calculated from the ref the `cargo` plugin generated, which will be the real source of truth when it comes time to stage the sources for the build - During fetch, we already have all the version numbers and hashes of all dependencies tracked under the `cargo` plugin's ref. So, we download those dependencies. This is done by the `cargo` source. `Cargo.lock` is redundant here - During stage, it's the same as fetch. We have all the data available, so we set up a special vendoring directory and populate it with all the libraries. Again, all done by the `cargo` source. `Cargo.lock` is redundant here If it doesn't work for some reason (i.e. if the toolchain in the sandbox really does need `Cargo.lock` around during the build) I'll serialize the file into a ref like I did before and like you describe. > but I think a fundamental understanding needs to be explained somewhere in the documentation to make plugin authoring more intuitive, I'm not sure where in the docs or how best to explain this. I absolutely agree. `bst`'s plugin architecture definitely felt confusing/overwhelming when I first started writing plugins. And I made lots of the mistakes you detail here. I fixed them over time as people tried to build my project in their own clean environment and failed, and that taught me how the system works. But it'd be nice if it could be documented enough to get it right the first time :) A lot of what you wrote here, especially the "Overall reproducibility expectations" section, would be useful in the docs. Otherwise the plugin architecture feels a bit impenetrable. Some extra docs regarding URL translation (when should it be done? how? what does it mean to mark a URL? why does this need to be done?) would be nice too; that stuff still confuses me... It can probably just be top-level documentation for the `Source` class itself (i.e. [here](https://docs.buildstream.build/2.0/buildstream.source.html)). Worst-case, you can add to the Additional Writings section > As a side note, every source which uses gen-cargo-lock in your project shares the same Cargo.lock file, because you have put it directly at the root of the mirror directory owned by the gen-cargo-lock source kind. Hmm so this might be a problem if multiple tracks are happening in parallel. Again, we only need to ferry the data from one source to the very next source, and then that one will turn it into an actual ref. My concern here is that if we have multiple cargo projects being tracked in parallel, they share the one file and trample each-other. Somes like I just need to incorporate the project and element name into the local cache name to avoid a race condition > Consequently, your plugin risks introducing different data for the same cache key, risking the following scenario: This might be an issue if the build chain ends up looking at the `Cargo.lock` file anyway, despite not having a need to since all dependencies are already vendored. It would be nice if I could tell, from `stage`, if I'm staging for a track or for a build. If it's for a track I'd stage `Cargo.lock`, but if it's for a build I'd stage nothing to ensure consistency between builds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
