gtristan commented on issue #1851:
URL: https://github.com/apache/buildstream/issues/1851#issuecomment-1670587778

   Hi,
   
   > > Ok so at a glance, I can see some pretty bad problems with your 
plugins... looking right now at 
https://gitlab.com/carbonOS/build-meta/-/blob/35f91b6d/plugins/gen-cargo-lock.py
   > 
   > I'm sorry, but you looked at the wrong plugin!! `gen-cargo-lock` is a very 
special case, as I'll explain below. My other plugins (`gnu` and `go-vendor`) 
should be a lot more normally implemented
   > 
   > Also, I didn't write this one! I got it from here: 
https://gitlab.com/BuildStream/bst-plugins-experimental/-/merge_requests/223
   
   Just trying to be helpful, I think that once you have an understanding of 
the generally simple architecture, plugin writing becomes fairly intuitive - 
but I think our docs don't do a great job for new users to get into the flow.
   
   
   > > Your source does not properly implement load_ref(), get_ref(), 
set_ref(), get_unique_key(), is_cached() or is_resolved()
   >
   > This is because it doesn't need to. `gen-cargo-lock` is a source that is 
always immediately followed by a `cargo` source. The `Cargo.lock` file this 
plugin generates is used for exactly one purpose: by the `cargo` source while 
tracking. All the data contained in the `Cargo.lock` file gets turned into 
buildstream refs by the `cargo` source and put into `project.refs`, so there's 
no need for the `gen-cargo-lock` plugin to have a unique ID. At build time, 
`cargo` has all of its dependencies vendored and so it doesn't need to look at 
`Cargo.lock`. See [here in my 
`project.refs`](https://gitlab.com/carbonOS/build-meta/-/blob/6f57dc2e/project.refs#L732-1020)
 for an example of this
   
   Yeah, reading this I'm not really confident that this is going to work well.
   
   I understand your general logic but I do think this plugin steps out of 
bounds and will fall into scenarios that will break.
   
   There is *definitely* the issue of race conditions around the local caching 
of the `Cargo.lock` file in `Source.get_mirror_directory()` even if the 
intention is to treat this as a transiently needed file which is later ignored, 
parallel tracking of `gen-cargo-lock` sources in different elements will have 
them all writing to the same file.
   
   I'm not sure if the `cargo` plugin will generate the `Cargo.lock` file in 
the build directory which will likely be relevant at build time, and we could 
also be intentionally *overwriting* an existing `Cargo.lock` file in the 
external rust source tree that differs from the one we tracked.
   
   Finally, the `gen-cargo-lock` *is* responsible for staging the `Cargo.lock`, 
and in this setup the `Cargo.lock` will *only* exist on a machine where 
tracking previously occurred.
   
   Also I should point out that with BuildStream 2, the results of 
`Source.stage()` are cached in the CAS, which means that the unique key of the 
`gen-cargo-lock` source *by itself* is significant - subsequent runs of `bst 
source track` + `bst build` will likely end up re-staging an old staging result 
from previous runs due to the source cache key being unchanged and available in 
the CAS.
   
   I don't think there is any safe way to do this without either:
   
   * Serializing the `Cargo.lock` as you mentioned you had previously done, 
that sounds like the sanest approach
   * There is another alternative I considered for the sake of *"prettiness"* 
which is for example, if you have some additional infra
     * At `Source.track()` time, the resulting Cargo.lock could be uploaded to 
an artifactory instance as a file named after the `Cargo.lock` sha256sum
     * The sha256sum could be used as a ref
     * `Source.fetch()` would download the lock from artifactory
   
   This second approach is also totally viable, but I didn't want to suggest it 
because it involves additional infra / moving parts, so while it may be more 
*"pretty"* to look at the resulting bst files, it's not a win.
   
   [...]
   > > but I think a fundamental understanding needs to be explained somewhere 
in the documentation to make plugin authoring more intuitive, I'm not sure 
where in the docs or how best to explain this.
   > 
   > I absolutely agree. `bst`'s plugin architecture definitely felt 
confusing/overwhelming when I first started writing plugins. And I made lots of 
the mistakes you detail here. I fixed them over time as people tried to build 
my project in their own clean environment and failed, and that taught me how 
the system works. But it'd be nice if it could be documented enough to get it 
right the first time :)
   > 
   > A lot of what you wrote here, especially the "Overall reproducibility 
expectations" section, would be useful in the docs. Otherwise the plugin 
architecture feels a bit impenetrable. Some extra docs regarding URL 
translation (when should it be done? how? what does it mean to mark a URL? why 
does this need to be done?) would be nice too; that stuff still confuses me...
   > 
   > It can probably just be top-level documentation for the `Source` class 
itself (i.e. 
[here](https://docs.buildstream.build/2.0/buildstream.source.html)). 
Worst-case, you can add to the Additional Writings section
   
   Yeah I'm not sure where to put this kind of thing, I think probably we need 
a solid *"Introduction"* which should be required reading for any of the plugin 
authoring documentation (elements & sources alike), which should help the 
further reading be more intuitive.
   
   On a side note, I am really happy to hear that you have been leveraging the 
plugin system successfully :)
   
   > > As a side note, every source which uses gen-cargo-lock in your project 
shares the same Cargo.lock file, because you have put it directly at the root 
of the mirror directory owned by the gen-cargo-lock source kind.
   > 
   > Hmm so this might be a problem if multiple tracks are happening in 
parallel. Again, we only need to ferry the data from one source to the very 
next source, and then that one will turn it into an actual ref. My concern here 
is that if we have multiple cargo projects being tracked in parallel, they 
share the one file and trample each-other. Somes like I just need to 
incorporate the project and element name into the local cache name to avoid a 
race condition
   
   Right, I pre-followed up on just that above, sorry but I have to just scan 
through and reply ;-)
   
   Again, I don't think this approach is really viable.
   
   > > Consequently, your plugin risks introducing different data for the same 
cache key, risking the following scenario:
   > 
   > This might be an issue if the build chain ends up looking at the 
`Cargo.lock` file anyway, despite not having a need to since all dependencies 
are already vendored.
   > 
   > It would be nice if I could tell, from `stage`, if I'm staging for a track 
or for a build. If it's for a track I'd stage `Cargo.lock`, but if it's for a 
build I'd stage nothing to ensure consistency between builds
   
   Generally speaking, we don't want to give plugin methods more context than 
what is strictly necessary, as plugin methods are things we can later decide to 
call in different ways when refactoring the core and making core improvements - 
it is best to keep the APIs simple and strict.
   
   That said, there is no such context; sources are staged when the core asks 
the source to stage the data, when the *core* decides, this happens only if the 
`Source.is_cached()` condition is satisfied.
   
   I should point out that the `gen-cargo-lock` lies about `Source.is_cached()` 
and `Source.is_resolved()` because it pretends to be cached, and pretends to 
know what data will be cached respectively, when it in fact does not (i.e. when 
you call `bst build` *without* previously having called `bst source track`, it 
is inevitably busted).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to