Hi all,
Last week, some of us met via teleconferencing to discuss BuildStream
2.0, current blockers and active topics on the mailing list. Those who
participated in this call include Benjamin Schubert, Jürg Billeter,
Sander Striker, Tristan van Berkom and myself. I am writing this
message to report on the results of our discussions.
Issues discussed
================
We primarily focused on discussing the currently active topic of
protecting against modification of artifacts by plugins [1].
To summarize the issue again - while some plugins currently do this,
and it is technically part of BuildStream’s public API by mistake, it
is NOT considered fine for plugins to import files from the host
directly, or mutate the contents of the sandbox using host tools. Note
that host tools in this context includes running Python code on the
host and using modules like `tarfile` to write files inside the
sandbox.
The proposal is to remove any public APIs that allows this to happen.
We realize that there may be a gap in our API, that may mean that it’s
not possible to rewrite all plugins that use host tools directly. In
this message, we will outline a plan of how to get there.
Problematic API
---------------
The most common way the virtual directory API is abused in this way is
via the `Directory.open_file()` method [2], that allows plugins to
directly write a new file inside the sandbox. There are also a few
other methods that don’t have to be part of the public API moving
forward, as outlined in #1294 [3]. Thankfully, we haven’t noticed many
plugins using these other methods so far.
Offending plugins
-----------------
As mentioned in the original thread, the offending plugins include the
collect_manifest and oci plugins. But, it’s not limited to just those
two plugins. In fact, following known plugins are using virtual
directory API to write files into the sandbox directly:
* bst_plugins_experimental/elements/bazelize.py
* bst_plugins_experimental/elements/collect_integration.py
* bst_plugins_experimental/elements/collect_manifest.py
* bst_plugins_experimental/elements/dpkg_build.py
* bst_plugins_experimental/elements/dpkg_deploy.py
* bst_plugins_experimental/elements/flatpak_image.py
* bst_plugins_experimental/elements/oci.py
* bst_plugins_experimental/elements/tar_element.py
* bst_plugins_containers/elements/docker_container.py
Looking at these plugins, some patterns start to emerge. Plugins like
tar, oci, docker_image etc. can be grouped together as “packaging”
elements. These elements generally need tools like `tar` and
`sha256sum` to be available inside the sandbox, and can do their
processing inside the sandbox.
Next, we have elements like bazelize that need to write some sort of
build manifest inside the sandbox. In the case of a hypothetical bazel
element, this would be a bazel WORKSPACE file. Such a manifest would
need to know where the dependencies are staged, where the sources are
etc. This is where our API may have some gaps, and we need to ensure
that BuildStream provides sufficient public API that doing something
like this is feasible without tampering with the sandbox directly from
the host.
Lastly, we have collect_manifest, which is a different beast really .
This long-standing issue [4] highlights the problems with that plugin.
It may need some redesign even after the above issues are resolved.
There has been some discussions about this on IRC lately [5], that
show a path forward.
What’s next
===========
Now that we have identified what needs to be fixed, here is a plan for
moving forward.
BuildStream Core
----------------
On the BuildStream side, we need to carefully review the public
Directory API, and make such methods private that don’t need to be
exposed to plugins directly. Certain plugins, like `compose` may need
to access some of the API that will be made private, but since they
will live in Core, it’s easier to keep them in check.
This is considered a blocker for 2.0 as we don’t want to release 2.0
with some public API that’s going to be removed later on.
Offending plugins will need to take action to stop using such API.
Listed below are some guidelines on how to achieve that.
Packaging plugins
------------------
All the “packaging” plugins need to stop relying on host tools for
creating archives and manifests, and instead do this processing inside
the sandbox.
For the most part, thes plugins are using such API so that they can
avoid mixing their tool dependencies (tar, sha256sum etc) from their
content dependencies (i.e. the artifacts that go inside the
“package”). For example, for the docker_image plugin, the list of
build dependencies implies the set of artifacts that should be
included in the resulting docker image.
However, the same result could be achieved by staging the tools at `/`
and the content dependencies somewhere else. These plugins would also
need a way to differentiate between tool and content dependencies.
This is already possible in the current Element API, but may lead to
duplication like so:
kind: docker
build-depends:
- tools.bst
- content1.bst
- content2.bst
config:
content-dependencies:
- content1.bst
- content2.bst
With something like the above, the docker plugin could read the
`content-dependencies` config option and stage those dependencies in a
separate location, and then run `tar` etc inside the sandbox, like any
other build element. We can however introduce some API to reduce this
duplication. In essence, we need some mechanism to annotate and
distinguish between tool and content dependencies.
So, the aim would be to avoid repetition that is shown in the above
example. We don’t have a proposal for this yet, but conceivably this
could be a new kind of dependency, or a new plugin. Suggestions
welcome!
Bazel-like plugins
------------------
These plugins are the kind of plugins that push the boundaries of our
plugin API. And, we want to make sure that there’s a path forward for
such plugins without sacrificing BuildStream’s core principles of
repeatability and reproducibility. While at the same time trying to
create the best user experience for the interactive developer
workflow.
As such, it is considered a high priority to try to rewrite a plugin
like bazel, that does not exercise any host tools. This may need some
additional API from BuildStream, but let’s not add that preemptively.
Having such a plugin will give us confidence that our APIs are
sufficiently powerful to allow plugin authors to write complex plugins
without having to resort to hacks.
Documentation
-------------
We are also lacking documentation for plugin authors, other than the
API reference. We can enhance that by adding a “Plugin Authors’
Guide”, covering how to do common plugin operations in the right way,
do’s and don'ts etc.
Thanks,
Chandan
[1]: https://mail.gnome.org/archives/buildstream-list/2020-June/msg00017.html
[2]:
https://docs.buildstream.build/master/buildstream.storage.directory.html#buildstream.storage.directory.Directory.open_file
[3]: https://gitlab.com/BuildStream/buildstream/-/issues/1294
[4]: https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/2
[5]:
https://irclogs.baserock.org/buildstream/%23buildstream.2020-07-02.log.html#t2020-07-02T08:33:47