Hi all,
Thanks Abderrahim for such an in-depth reply, this cannot have been
easy.
I also separately read the proposal when it came out, and then read it
numerous times on Friday, which will probably contribute to some hair
loss... I also noticed on Friday that the speculative actions are
retrieved from artifacts with the same weak key.
I largely agree with Abderrahim's points here, and will try to outline
my thoughts at a high level:
* I agree that we should be observing what gains we can achieve simply
by using tools like RECC.
* With the advances we have made towards safely using RECC, I think
we are on a path now which may allow us to use Bazel safely
and correctly.
It seems to me that we should not be re-inventing this kind of
parallelism and fine grained caching, but rather using tools which
were written for that purpose - and so I would also be interested
to see how bazel projects perform within BuildStream, before
resorting to these tactics.
* Is the REAPI client (BuildStream) really the right place for this ?
I wonder if something similar could be achieved in a more generic
way, if it were to be considered at the remote execution service
level.
If for example, the concept of "Nested Actions" were introduced,
with the understanding that some course grained actions cause the
action runner itself becoming an REAPI client and creating nested
actions - maybe these could be remembered and used to prime the cache
in a more generic way.
I realize this is a bit vague, and would probably require some
changes in BuildStream, such as providing all of the actions
up front in such a way that the execution service can work out
the toplevel action dependencies.
Cheers,
-Tristan
On Sun, 2025-10-26 at 19:31 +0100, Abderrahim Kitouni wrote:
> Hi Sander,
>
> Le lun. 20 oct. 2025 à 18:09, Sander Striker <[email protected]> a écrit :
> >
> > Hi,
> >
> > I’ve been working on a proposal that I’m really excited to share. It could
> > significantly improve BuildStream’s performance on large dependency graphs,
> > especially for C/C++ codebases, without changing correctness or requiring
> > new semantics from build tools. This builds on top of the current state of
> > the art, using both Remote Execution for parallelism and using recc to
> > cache and remote execute compilation at the translation unit level.
> >
> > I've put the full proposal in issue #2083
> > <https://github.com/apache/buildstream/issues/2083>, as that renders the
> > included diagrams nicely. I'll not rehash it here as it is quite long. A
> > quick summary:
> > When we build an element, sub-actions are spawned, e.g. compile through
> > recc. We can record these subactions, and in a next build speculatively
> > execute those same sub-actions, with updated files, at the start of the
> > build. We can do this aggressively for all actions regardless of what
> > element they correspond to in the dependency graph. When elements are
> > built, the sub-actions within will receive cache hits, dramatically
> > speeding up the overall end-to-end build.
> >
> > I am quite happy with the limited amount of additional bookkeeping that is
> > required to implement the proposal. I do think it is a step towards
> > Bazel's promise of "{ Fast, Correct } — Choose two", while not requiring
> > all projects/elements to adopt a different build system.
> >
> > I’d love to hear thoughts on
> >
> > - the proposal in general
>
> I've talked with a few colleagues about this proposal and overall
> people are confused. I think it would be useful to state your premise
> before diving into the technical details.
>
> Let me try to state my understanding, and please let me know if it's
> correct. We assume that:
> * The project is already using RECC (or similar) in BuildStream
> * While this provides a welcome improvement, we want even faster builds
> * The user is using remote execution (and not just building locally
> using remote execution API), and have access to large "build grid"
> that is potentially sitting idle
> * One thing we could do is try to predict actions that RECC will
> execute and schedule them early, so that RECC will find them in the
> (remote) action cache when needed
>
> Given this, I think it might be too early to start working on this. We
> should look at how much improvement RECC-in-BuildStream brings by
> itself, before trying to optimize further. I have a draft merge
> request for freedesktop-sdk [1], and I hope to start working on
> integrating it in gnome-build-meta [2] once that MR lands.
>
> Another point is that I'm not sure the complexity of this proposal is
> proportionate to the performance improvement we hope to gain. Happy to
> be proven wrong on this aspect, though :-)
>
> > - the best way to integrate speculative action scheduling with
> > BuildStream’s element execution model
>
> I think the best way to implement this would be as an additional
> (optional) stage between source fetch and build. I'm not very familiar
> with the inner workings of the scheduler, but I think it is possible
> to have the job run when all the build dependencies of an element have
> either their sources or artifacts ready. We could potentially require
> that some elements have their artifacts, and not accept only sources
> (I'm thinking of stuff like compilers here). At this point, we can
> start scheduling the speculative actions. I feel that we could
> schedule more speculative actions as more build dependencies have
> their artifacts ready, but this means potentially scheduling the
> "cache priming" job more than once for any given element.
>
> One thing I notice is that in your proposal, generating speculative
> actions happens after the element is built. My thinking above points
> more towards having it part of the second build. Either way, there is
> something important to consider: we need to store information between
> the two builds. Your proposal seems to suggest putting it in the
> artifact, but I am not sure how to retrieve the old artifact from the
> new build: the weak cache key will change if the sources of the
> elements change (which is the most important use case IMO).
>
> > - any projects you'd like to try this on
>
> gnome-build-meta seems like an ideal target. We try to build with the
> latest main/master of every GNOME project at least once per day, and
> every time we upgrade glib everything needs to be rebuilt including
> two instances of WebKitGtk. However it doesn't have access to a
> "potentially idle large build grid", so I'm not sure if we can
> actually implement this.
>
>
> Anyway, these are some quick thoughts (as quick as the complexity of
> the proposal allows). I'll keep thinking about it, I may have more
> ideas.
>
> Cheers,
>
> Abderrahim
>
>
> [1] https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/22859
> [2] https://gitlab.gnome.org/GNOME/gnome-build-meta/