gtristan commented on issue #1851: URL: https://github.com/apache/buildstream/issues/1851#issuecomment-1668845828
Ok so at a glance, I can see some pretty bad problems with your plugins... looking right now at https://gitlab.com/carbonOS/build-meta/-/blob/35f91b6d/plugins/gen-cargo-lock.py Your source does not properly implement `load_ref()`, `get_ref()`, `set_ref()`, `get_unique_key()`, `is_cached()` or `is_resolved()`. I hope that you have tried to read through https://docs.buildstream.build/2.0/buildstream.source.html as this should help, but I think a fundamental understanding needs to be explained somewhere in the documentation to make plugin authoring more intuitive, I'm not sure where in the docs or how best to explain this. ### Overall reproducibility expectations At a high level, I should say that it is important that your source plugin expects to be able to work without ever running `bst source track`, this means that either at `bst source fetch` time, the `ref` should be pointing at data which is possible to download from the internet reliably (such that the downloaded data is guaranteed to be bit-for-bit identical every time it is downloaded, like a git commit or a file checksum) **or** in a case like the one you are trying to implement, where this is simply not possible, then the data itself should be stored in the source *ref*. After a build pipeline has been *tracked* successfully, then it is considered *resolved* (the plugin can indicate this by returning `True` from `is_resolved()`)... in other words... the entire pipeline's cache keys have been *resolved* and we know *exactly* what the inputs are going to be for the build, at least as far as the source inputs go, they are now guaranteed to be bit-for-bit identical for a given project's resolved cache keys. At this time, it is expected that your project (possibly including your project.refs) represents something which can be *reproduced* (whether the output will be bit-for-bit identical depends on other factors, but the overall build is considered *repeatable* at least). In other words, *even* if you did not choose to revision your `project.refs` file in git, I should be able to clone your repository, and with the `project.refs` file which you emailed me out of band (or a `project.refs` file which I downloaded from a CI artifact or obtained as a part of an official release), I should be able to reproduce your build exactly on my own infra. ### Specific reasons why your plugin wont work Here are some reasons I can see that your plugin cannot possibly work and why * The data obtained in `bst source track` by way of running `cargo generate-lockfile` is not available to me when I build. It is instead stored in the local source cache `~/.cache/buildstream/sources/gen-cargo-lock/Cargo.lock` on the remote machine where you ran `bst source track`, as such it is only available on that build host where you ran `bst track` - dirty, but might work in a very limited sense. * As a side note, *every* source which uses `gen-cargo-lock` in your project shares the *same* `Cargo.lock` file, because you have put it directly at the root of the mirror directory owned by the `gen-cargo-lock` source *kind*. * We should ignore this for now, because I don't think you can implement this plugin by using the local filesystem source cache directories * The state of the lock file is not accounted for in the source unique key * This means that your plugin's contribution to the cache key is the same value regardless of the state of the `Cargo.lock` * Consequently, your plugin risks introducing different data for the same cache key, risking the following scenario: * Your first successfully build an artifact for that *cache key* * You now run `bst source track`, on that element, obtaining *new* dependencies in your `Cargo.lock` * You now run `bst build` for that element, but the element is already cached under the *same* cache key, and no build is needed * As a general rule, usually at least the *ref* is included as a part of the value returned by `get_unique_key()`. ### General recommendation Your edge case plugin has the property that the *tracked* data (usually intended to be a *reference* for something to download or *"fetch"*) cannot be stored separately from the project data, and cannot be *fetched*. In order to implement this reliably, I would recommend that you store the results of `cargo generate-lockfile` directly in the *ref*. As such, functionality would run like this: * `bst source track` At track time, after running `cargo generate-lockfile`, you read the generated lockfile and serialize it as a string to be stored as the *ref*, properly *returning* that string from your `Source.track()` implementation. You *must* then also implement `Source.set_ref()`, `Source.get_ref()` and `Source.load_ref()`, simply by treating the ref as a string, and just name it as `ref` for the purpose of implementing these; this will result in storing the *contents* of the discovered `Cargo.lock` file as a serialized string in the `project.refs` file. * `bst source fetch` At fetch time, your plugin does nothing. * `Source.get_unique_key()` Simply include the ref in the data returned by this function, ensuring that your source plugin reports a key that is unique for the data which it intends to introduce into the build. * `Source.stage()` Here we want to take the serialized `ref`, deserialize it and write the `Cargo.lock` file in the desired location. This approach should be fairly simple to implement and should work properly in all bst use cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
