gtristan commented on issue #1851:
URL: https://github.com/apache/buildstream/issues/1851#issuecomment-1668845828

   Ok so at a glance, I can see some pretty bad problems with your plugins... 
looking right now at 
https://gitlab.com/carbonOS/build-meta/-/blob/35f91b6d/plugins/gen-cargo-lock.py
   
   Your source does not properly implement `load_ref()`, `get_ref()`, 
`set_ref()`, `get_unique_key()`, `is_cached()` or `is_resolved()`. I hope that 
you have tried to read through 
https://docs.buildstream.build/2.0/buildstream.source.html as this should help, 
but I think a fundamental understanding needs to be explained somewhere in the 
documentation to make plugin authoring more intuitive, I'm not sure where in 
the docs or how best to explain this.
   
   ### Overall reproducibility expectations
   
   At a high level, I should say that it is important that your source plugin 
expects to be able to work without ever running `bst source track`, this means 
that either at `bst source fetch` time, the `ref` should be pointing at data 
which is possible to download from the internet reliably (such that the 
downloaded data is guaranteed to be bit-for-bit identical every time it is 
downloaded, like a git commit or a file checksum) **or** in a case like the one 
you are trying to implement, where this is simply not possible, then the data 
itself should be stored in the source *ref*.
   
   After a build pipeline has been *tracked* successfully, then it is 
considered *resolved* (the plugin can indicate this by returning `True` from 
`is_resolved()`)... in other words... the entire pipeline's cache keys have 
been *resolved* and we know *exactly* what the inputs are going to be for the 
build, at least as far as the source inputs go, they are now guaranteed to be 
bit-for-bit identical for a given project's resolved cache keys.
   
   At this time, it is expected that your project (possibly including your 
project.refs) represents something which can be *reproduced* (whether the 
output will be bit-for-bit identical depends on other factors, but the overall 
build is considered *repeatable* at least).
   
   In other words, *even* if you did not choose to revision your `project.refs` 
file in git, I should be able to clone your repository, and with the 
`project.refs` file which you emailed me out of band (or a `project.refs` file 
which I downloaded from a CI artifact or obtained as a part of an official 
release), I should be able to reproduce your build exactly on my own infra.
   
   ### Specific reasons why your plugin wont work
   
   Here are some reasons I can see that your plugin cannot possibly work and why
   
   * The data obtained in `bst source track` by way of running `cargo 
generate-lockfile` is not available to me when I build. It is instead stored in 
the local source cache `~/.cache/buildstream/sources/gen-cargo-lock/Cargo.lock` 
on the remote machine where you ran `bst source track`, as such it is only 
available on that build host where you ran `bst track` - dirty, but might work 
in a very limited sense.
     * As a side note, *every* source which uses `gen-cargo-lock` in your 
project shares the *same* `Cargo.lock` file, because you have put it directly 
at the root of the mirror directory owned by the `gen-cargo-lock` source *kind*.
     * We should ignore this for now, because I don't think you can implement 
this plugin by using the local filesystem source cache directories
   * The state of the lock file is not accounted for in the source unique key
     * This means that your plugin's contribution to the cache key is the same 
value regardless of the state of the `Cargo.lock`
     * Consequently, your plugin risks introducing different data for the same 
cache key, risking the following scenario:
       * Your first successfully build an artifact for that *cache key*
       * You now run `bst source track`, on that element, obtaining *new* 
dependencies in your `Cargo.lock`
       * You now run `bst build` for that element, but the element is already 
cached under the *same* cache key, and no build is needed
     * As a general rule, usually at least the *ref* is included as a part of 
the value returned by `get_unique_key()`.
   
   ### General recommendation
   
   Your edge case plugin has the property that the *tracked* data (usually 
intended to be a *reference* for something to download or *"fetch"*) cannot be 
stored separately from the project data, and cannot be *fetched*.
   
   In order to implement this reliably, I would recommend that you store the 
results of `cargo generate-lockfile` directly in the *ref*.
   
   As such, functionality would run like this:
   
   * `bst source track`
   
     At track time, after running `cargo generate-lockfile`, you read the 
generated lockfile and serialize it as a string to be stored as the *ref*, 
properly *returning* that string from your `Source.track()` implementation.
   
     You *must* then also implement `Source.set_ref()`, `Source.get_ref()` and 
`Source.load_ref()`, simply by treating the ref as a string, and just name it 
as `ref` for the purpose of implementing these; this will result in storing the 
*contents* of the discovered `Cargo.lock` file as a serialized string in the 
`project.refs` file.
   
   * `bst source fetch`
   
     At fetch time, your plugin does nothing.
   
   * `Source.get_unique_key()`
   
     Simply include the ref in the data returned by this function, ensuring 
that your source plugin reports a key that is unique for the data which it 
intends to introduce into the build.
   
   * `Source.stage()`
   
     Here we want to take the serialized `ref`, deserialize it and write the 
`Cargo.lock` file in the desired location.
   
   
   This approach should be fairly simple to implement and should work properly 
in all bst use cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to