This is a wishlist bug, but it is important for architectures where we
are currently short on build power, and where this issue can stall builds
and waste an arbitrary amount of build power.

Cuirass should sort builds and only offload derivations for which all
inputs are available.

In my current understanding, cuirass offloads arbitrary derivations, and
the machine to which they are offloaded then starts building recursively
all inputs. If this is true, then it is possible that at some point in time,
all build slots are taken by the same package built as many times as there
are machines; I have seen something like this when working on core-updates,
where several machines were building the main gcc compiler at the same time.
At worse, if cuirass asks every machine to build a leaf package, this may
result in a simultaneous full bootstrap on all of them.

The situation becomes worse when the package in question fails. Then as
I understand it, each machine may receive a request to build something
depending on the failing package and try the failing build and thus waste
build power that will not be available to build other packages successfully.

Solving this problem may also make reports of build failures more accurate
and legible. For instance, doxygen currently fails to build on aarch64:
   https://ci.guix.gnu.org/build/969427/details
and is reported as "Failed", and not as "Failed (dependency)".
However, looking at the build log
   https://ci.guix.gnu.org/build/969427/log/raw
shows this:
...
building path(s) `/gnu/store/p5vqrwywz053r1vkiyw54dp9gj7vw9xd-ninja-1.11.1'
...
builder for `/gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv' 
failed with exit code 1
@ build-failed /gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv - 1 
builder for `/gnu/store/0zf7fqndzf2k595r4s6wblmpccdwr3nx-ninja-1.11.1.drv' 
failed with exit code 1
cannot build derivation 
`/gnu/store/hlscqram59id51hxg0fj15041v52h1kw-meson-1.1.0.drv': 1 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/w8qxkrwpffd9qs5w1jggy1yi27ycm0xr-jsoncpp-1.9.5.drv': 2 dependencies 
couldn't be built
cannot build derivation 
`/gnu/store/mss4yv015cil1vnjnglq506m83b7n3dy-cmake-bootstrap-3.24.2.drv': 1 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/w0irp6xn30nlmpizhcbjnvhqmsba41jn-cmake-minimal-3.24.2.drv': 2 
dependencies couldn't be built
cannot build derivation 
`/gnu/store/rqk2rbnpjpcnqswz8hqari1rnw6r8v1m-doxygen-1.9.5.drv': 1 dependencies 
couldn't be built

So it is indeed a different package that fails (and the last few lines
give a list of dependencies between ninja and doxygen, each of which may
or may not fail once ninja is fixed).

Notice that this could be solved without a topological sorting of the
dependency graph: It would be enough to keep an array deriv in which
deriv[i] contains a list of derivations requiring i more inputs to be built,
together with the list of inputs; elements in deriv[0] are ready to be sent
to a build machine, and upon completion of a build, all derivations
depending on it should be moved from deriv[i] to deriv[i-1] if the input
has been built successfully, or marked as "Failed (dependency)" if the
input has failed. (But this could be expensive, and may require appropriate
data structures.)

Alternatively, build jobs could be sorted topologically and then be kept
in a list; then before sending out a job, all its inputs have been tried
to be built; the job should then be sent if all inputs are available, or
be marked as "Failed (dependency)" if any of them has failed.

Andreas




Reply via email to