Hi all,
This proposal seeks to find a practical solution the problem I raised
earlier this year[0], where Element authors have mistakenly been given
permission to write directly into the sandbox.
There are a few tradeoffs here and the changes proposed here are
significant, but the changes overall should result in increased
reliability, while at the same time granting a compromise which allows
constructs which write into the sandbox.
I should also note that this is probably one of the last major API
changes we need to get through in order to unblock the final stages of
2.0, in my opinion at least (there is other ongoing work and APIs which
need review, proposals which need to be written but not necessarily
implemented, which are on the major blocker list).
This proposal consists of a handful of changes, so I will avoid
drilling too deeply into the finer details, but instead try to explain
what these changes are, how they are intertwined, and how they benefit
us.
The proposal
------------
The main changes are:
* Make public data immutable, removing Element.set_public_data() and
APIs which allow public data to be mutated.
* Make batching of commands issued in the sandbox mandatory.
* Force computation of the sandbox commands (if any) to be performed
early on, in advance of cache key calculation.
* Expose a special utility function which allows plugin provided
content to be placed into the sandbox at a specified path.
Like sandbox commands, the content must be computed early on.
A few words on each of these:
Immutable public data
~~~~~~~~~~~~~~~~~~~~~
Since public data can be modified by a build, and since the element's
build phase is the only point at which we can be certain that all
dependencies have been built: it is currently an invalid operation to
read public data from dependencies before the element's build phase.
This is in itself problematic as it is not entirely clear in the docs,
and there are no assertions in place to clearly enforce this invariant.
It is however desirable to be able to read public data off of
dependency elements in order to construct sandbox commands, or to
construct data when needs to be written into the sandbox by an Element
plugin.
Mandatory sandbox command batching
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This idea has already been in discussion with regards to improving
performance in remote execution scenarios[1], we should maximize on
batching wherever possible.
The reason why batching was never mandatory is that an element could
theoretically need to observe the results of a command in order to
decide what commands to run next, however we have not observed this
kind of construct in action before, and we think it is reasonable to
enforce this.
Early assembly of sandbox commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This will require an API change such that commands should be
constructed early on, probably directly at Plugin.configure() time.
Element will gain some API for command list to be created early on, and
these commands will be used automatically by the Element base class to
feed into the cache key.
Given immutability of public data, public data is still a suitable
candidate for constructing commands (as we do for integration commands
for example).
The key difference here is that:
* Plugins can use host python to construct commands in whatever way
they please, they should of course be careful about constructing
commands deterministically.
* The result of command construction feeds directly into the cache
key, which means that if a plugin ever fails to generate it's
commands deterministically, then the worst that can happen is
an undesirable rebuild - new cache keys are always generated
for any changes in the actual input.
Utility for writing into the sandbox
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Element will provide a function for writing files into the sandbox,
and the plugin can generate that content in whatever way it pleases,
however, like sandbox commands, that content must be computed early on
before cache key calculation, and the content itself will be
automatically taken into account in the Element's cache key.
This allows for a similar compromise as we do with sandbox commands
above, the plugin is given freedom to generate content dynamically, but
if and when the plugin fails to do this deterministically, then it will
result in the generation of new cache keys.
I.e. if for whatever reason you could not generate content
deterministically, you will suffer rebuilts; but you will always have a
unique cache key for each content blob you decided to write into the
sandbox.
Side effects
------------
So far, there is one major side effect I can see to this whole plan,
and while it may change the way we think about how elements interact,
so far I think it is a good "course correction".
This side effect is that generally, as far as an Element is concerned,
an Element cannot do anything at all differently based on the results
of a previously built artifact.
While we lose the idea that the python code of an element can do things
differently based on previously built artifacts, we can still of course
have elements generate sandbox commands which will behave differently
depending on what has already been built (which is inevitably true
anyway, build scripts will generally behave differently depending on
what is already staged).
This means that any interaction between elements and data which needs
to be "carried forward" so to speak, *must* absolutely be done within
the sandbox, which will certainly affect some elements, like dpkg_build
and dpkg_deploy for instance. These plugins will need to be changed
such that they no longer attempt to carry data over through mutable
public data, but instead they should place any data they need to carry
forward into a well known location in the resulting artifact, so that
reverse dependencies can work with that later on.
Any thoughts on this proposal ?
Cheers,
-Tristan
[0]:
https://lists.apache.org/thread.html/re3055975198fa5115b7ce3f533e58518443976b6ae2b13c3be15c881%40%3Cdev.buildstream.apache.org%3E
[1]: https://gitlab.com/BuildStream/buildstream/-/issues/1293