Hi Sander,

Le lun. 20 oct. 2025 à 18:09, Sander Striker <[email protected]> a écrit :
>
> Hi,
>
> I’ve been working on a proposal that I’m really excited to share.  It could
> significantly improve BuildStream’s performance on large dependency graphs,
> especially for C/C++ codebases, without changing correctness or requiring
> new semantics from build tools.  This builds on top of the current state of
> the art, using both Remote Execution for parallelism and using recc to
> cache and remote execute compilation at the translation unit level.
>
> I've put the full proposal in issue #2083
> <https://github.com/apache/buildstream/issues/2083>, as that renders the
> included diagrams nicely.  I'll not rehash it here as it is quite long.  A
> quick summary:
> When we build an element, sub-actions are spawned, e.g. compile through
> recc.  We can record these subactions, and in a next build speculatively
> execute those same sub-actions, with updated files, at the start of the
> build.  We can do this aggressively for all actions regardless of what
> element they correspond to in the dependency graph.  When elements are
> built, the sub-actions within will receive cache hits, dramatically
> speeding up the overall end-to-end build.
>
> I am quite happy with the limited amount of additional bookkeeping that is
> required to implement the proposal.  I do think it is a step towards
> Bazel's promise of "{ Fast, Correct } — Choose two", while not requiring
> all projects/elements to adopt a different build system.
>
> I’d love to hear thoughts on
>
>    - the proposal in general

I've talked with a few colleagues about this proposal and overall
people are confused. I think it would be useful to state your premise
before diving into the technical details.

Let me try to state my understanding, and please let me know if it's
correct. We assume that:
* The project is already using RECC (or similar) in BuildStream
* While this provides a welcome improvement, we want even faster builds
* The user is using remote execution (and not just building locally
using remote execution API), and have access to large "build grid"
that is potentially sitting idle
* One thing we could do is try to predict actions that RECC will
execute and schedule them early, so that RECC will find them in the
(remote) action cache when needed

Given this, I think it might be too early to start working on this. We
should look at how much improvement RECC-in-BuildStream brings by
itself, before trying to optimize further. I have a draft merge
request for freedesktop-sdk [1], and I hope to start working on
integrating it in gnome-build-meta [2] once that MR lands.

Another point is that I'm not sure the complexity of this proposal is
proportionate to the performance improvement we hope to gain. Happy to
be proven wrong on this aspect, though :-)

>    - the best way to integrate speculative action scheduling with
>    BuildStream’s element execution model

I think the best way to implement this would be as an additional
(optional) stage between source fetch and build. I'm not very familiar
with the inner workings of the scheduler, but I think it is possible
to have the job run when all the build dependencies of an element have
either their sources or artifacts ready. We could potentially require
that some elements have their artifacts, and not accept only sources
(I'm thinking of stuff like compilers here). At this point, we can
start scheduling the speculative actions. I feel that we could
schedule more speculative actions as more build dependencies have
their artifacts ready, but this means potentially scheduling the
"cache priming" job more than once for any given element.

One thing I notice is that in your proposal, generating speculative
actions happens after the element is built. My thinking above points
more towards having it part of the second build. Either way, there is
something important to consider: we need to store information between
the two builds. Your proposal seems to suggest putting it in the
artifact, but I am not sure how to retrieve the old artifact from the
new build: the weak cache key will change if the sources of the
elements change (which is the most important use case IMO).

>    - any projects you'd like to try this on

gnome-build-meta seems like an ideal target. We try to build with the
latest main/master of every GNOME project at least once per day, and
every time we upgrade glib everything needs to be rebuilt including
two instances of WebKitGtk. However it doesn't have access to a
"potentially idle large build grid", so I'm not sure if we can
actually implement this.


Anyway, these are some quick thoughts (as quick as the complexity of
the proposal allows). I'll keep thinking about it, I may have more
ideas.

Cheers,

Abderrahim


[1] https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/22859
[2] https://gitlab.gnome.org/GNOME/gnome-build-meta/

Reply via email to