[I] Speculative Actions: Predictive Cache Priming for BuildStream [buildstream]

via GitHub Mon, 20 Oct 2025 10:05:53 -0700


sstriker opened a new issue, #2083:
URL: https://github.com/apache/buildstream/issues/2083


   # Speculative Actions: Predictive Cache Priming for BuildStream
   
   ## Table of Contents
   
   1. [Executive Summary](#1-executive-summary)
   2. [Problem Statement](#2-problem-statement)
   3. [Solution Overview](#3-solution-overview)
   4. [Data Model](#4-data-model)
      - [4.1 Protocol Buffer Extensions](#41-protocol-buffer-extensions)
   5. [Concrete Example: libA → libB → 
appC](#5-concrete-example-liba--libb--appc)
      - [5.1 Example: libA Compilation](#51-example-liba-compilation)
      - [5.2 Example: libB Compilation](#52-example-libb-compilation)
      - [5.3 Example: appC Linking](#53-example-appc-linking)
      - [5.4 Overlay Type Summary](#54-overlay-type-summary)
   6. [System Flow](#6-system-flow)
      - [6.1 High-Level Process Flow](#61-high-level-process-flow)
      - [6.2 Data Flow Diagram](#62-data-flow-diagram)
   7. [Detailed Algorithms](#7-detailed-algorithms)
      - [7.1 Overlay Generation (after 
build)](#71-overlay-generation-after-build)
      - [7.2 Overlay Instantiation (before 
build)](#72-overlay-instantiation-before-build)
   8. [Implementation Plan](#8-implementation-plan)
   9. [Benefits](#9-benefits)
   10. [Key Design Decisions](#10-key-design-decisions)
   11. [Open Questions and Key 
Uncertainties](#11-open-questions-and-key-uncertainties)
       - [11.1 Phased Rollout Strategy](#1-phased-rollout-strategy)
       - [11.2 Overlay Resolution Success 
Rate](#2-overlay-resolution-success-rate)
       - [11.3 Non-Determinism Handling](#3-non-determinism-handling)
       - [11.4 Digest Resolution Performance 
Impact](#4-digest-resolution-performance-impact)
       - [11.5 Speculative Action Scale and Remote Execution 
Capacity](#5-speculative-action-scale-and-remote-execution-capacity)
       - [11.6 Storage and Network Overhead](#6-storage-and-network-overhead)
       - [11.7 Value of ACTION Overlays Without 
`rear`](#7-value-of-action-overlays-without-rear)
   12. [Future Enhancement: Local Execution with 
`rear`](#12-future-enhancement-local-execution-with-rear)
   13. [References](#references)
   
   ---
   
   ## 1. Executive Summary
   
   Cache Priming restores build parallelism in BuildStream by speculatively 
executing compiler invocations ahead of time, warming the Remote Execution 
ActionCache before actual element builds run. This dramatically reduces 
end-to-end build latency without compromising correctness.
   
   **Key Insight**: After building an element once, we know all the subactions 
(e.g. `recc` compiler invocations) that occurred. We can store these with 
overlay instructions, then reuse them in future builds by adapting inputs for 
source/dependency changes.
   
   **BuildStream and Remote Execution**: BuildStream is built on Remote 
Execution API (REAPI) paradigms, using a local buildbox-casd as a CAS proxy 
that connects to remote Execution, ActionCache, and CAS services. Cache Priming 
extends this architecture by adding speculative action submission to warm the 
ActionCache before element builds execute.
   
   ## 2. Problem Statement
   
   BuildStream orchestrates repository-level builds across many elements, 
allowing each element to use its native builds system, but not achieving the 
fine-grained parallelism available within individual compilation units. C/C++ 
projects are particularly affected; translation units could compile in parallel 
but instead wait sequentially for their element's turn in the build graph.
   
   ### Why Cache Priming is Particularly Effective
   
   **Code churn patterns favor cache reuse**: Research consistently shows that 
code changes are concentrated in a small subset of files, with most code 
remaining stable between builds. This stability pattern makes speculative 
execution with cached results highly effective.
   
   **Key Research Findings:**
   
   1. **Adams et al. (2016)** conducted a comprehensive case study on GLib, 
PostgreSQL, Qt, and Ruby systems analyzing C/C++ header file "hotspots"—headers 
that change frequently and trigger expensive rebuild processes. They 
empirically demonstrated that most header files are relatively stable compared 
to implementation files, with only a small subset accounting for the majority 
of build-time impact.
   
   2. **Misu et al. (2024)** analyzed 383 GitHub projects using Bazel and found 
that incremental builds with caching achieve median speedups of 4.22x (build 
system tool-independent cache) to 4.71x (build system tool-specific cache) for 
long-build duration projects. This demonstrates the substantial real-world 
value of build caching when most inputs haven't changed.
   
   3. **Matev (2020)** studied large C++ projects using ccache for distributed 
compilation, demonstrating that with a hot cache, compilation can be 4-5 times 
faster than with cache misses. The study showed that "incremental builds often 
do not significantly speed up compilation because even just a change of the 
modification time of a widely used header leads to many compiler and linker 
invocations"—highlighting both the cost of header changes and the benefit when 
they remain stable.
   
   4. **Chowdhury et al. (2024)** analyzed 774,051 source code methods from 49 
prominent open-source Java projects and found that "approximately 80% of 
changes are concentrated in just 20% of the methods, demonstrating the Pareto 
80/20 principle. Moreover, this subset of methods is responsible for the 
majority of the identified bugs in these projects." This concentration of 
changes means the vast majority of code remains unchanged between builds.
   
   **Supporting Evidence on Change Concentration:**
   
   - **Shin et al. (2011)**: In Mozilla Firefox and Red Hat Enterprise Linux, 
70.8% of known vulnerabilities were found in only 10.9% of files, demonstrating 
extreme concentration of churn in a small subset of the codebase.
   
   - **Nagappan and Ball (2007)**: In Windows Server 2003, software 
dependencies and churn measures were efficient predictors of post-release 
failures, further confirming that changes cluster in predictable locations.
   
   - **Munson and Elbaum (1998)**: Established that code churn, the amount of 
code change over time, is measurable and correlates with fault-proneness, 
confirming that change patterns are non-uniform and predictable.
   
   In C/C++ projects specifically:
   - **Implementation files (.cpp) churn more frequently** than header files 
(.h)
   - **Header files are more stable** and change less often (Adams et al., 2016)
   - **Header file changes are expensive**: Even just changing the modification 
time of a widely-used header triggers many recompilations (Matev, 2020)
   - **Build caching is highly effective**: Studies show 4-5x speedups with 
warm caches (Matev, 2020; Misu et al., 2024)
   - Changes tend to cluster in specific modules while the broader codebase 
remains stable
   
   **Example**: In a typical incremental build of a C++ project with 1000 
compilation units:
   - ~200 files changed (20% following Pareto)
   - ~800 files unchanged (80%)
   - With effective build caching, the 800 unchanged compilations can be served 
from cache
   - Only the 200 changed files need actual compilation
   - **Result**: Build time dominated by the 200 changed files, achieving 4-5x 
speedup
   
   This is why Cache Priming is particularly powerful for C/C++ codebases: the 
stability of header files and the concentration of changes in a minority of 
implementation files means that speculative compilation actions can frequently 
reuse cached results from previous builds.
   
   ---
   
   ## 3. Solution Overview
   
   **High-Level Flow:**
   
   ```mermaid
   flowchart LR
       Build["Build #N"] -->|instantiates| SpecActions["Speculative 
Actions<br/>(generated by Build #N-1)"]
       SpecActions -->|warm| ActionCache[(ActionCache)]
       ActionCache -->|serves| ElementBuilds[Element Builds]
       ElementBuilds -->|produce| SubActions["(Sub)Actions"]
       SubActions -->|transformed into| SpecActions2[Speculative Actions]
       SpecActions2 -->|stored in| ArtifactCache[(Artifact Cache)]
       SpecActions2 -.-> SpecActions
   
       style ActionCache fill:#fff4e1
       style ArtifactCache fill:#fff4e1
       style ElementBuilds fill:#e1f0ff
       style SubActions fill:#e1f0ff
   ```
   
   **Process:**
   
   1. **Priming Phase** (before build): Instantiate speculative actions by 
applying overlays, submit for execution
   2. **Build Phase**: Element builds hit warmed ActionCache, reducing latency
   3. **Generation Phase** (after build): Record subactions with overlay 
instructions describing how to adapt inputs
   
   ---
   
   ## 4. Data Model
   
   ### 4.1 Protocol Buffer Extensions
   
   **Artifact.speculative_actions** (new field)
   ```protobuf
   message Artifact {
     Digest speculative_actions = 19;  // Points to SpeculativeActions in CAS
   }
   
   message SpeculativeActions {
     repeated SpeculativeAction actions = 1;
     
     message ReferencedSpeculativeAction {
       string element = 1;             // Element that speculative actions are 
referenced from
       Digest speculative_actions = 2; // The speculative actions of the 
element at the time of artifact creation
     }
     
     repeated ReferencedSpeculativeAction referenced_speculative_actions = 2; 
// All speculative actions from all elements referenced in ACTION overlays
   }
   
   message SpeculativeAction {
     Digest base_action_digest = 1;    // Original Action from previous build
     repeated Overlay overlays = 2;    // How to adapt inputs
     
     message Overlay {
       enum OverlayType {
         SOURCE = 0;                   // From element's source tree
         ARTIFACT = 1;                 // From dependency element's artifact 
output
         ACTION = 2;                   // From another speculative action output
       }
       OverlayType type = 1;
       string source_element = 2;      // Element reference
       Digest source_action = 3;       // For ACTION type: which action
       string source_path = 4;         // Path within source
       Digest target_digest = 5;       // Digest to replace in input tree
     }
   }
   ```
   
   **ActionResult.subactions** (new field in REAPI)
   ```protobuf
   message ActionResult {
     repeated Digest subactions = 10;  // Digests of spawned Actions (e.g., 
recc invocations)
   }
   ```
   
   ---
   
   ## 5. Concrete Example: libA → libB → appC
   
   ### Dependency Structure
   ```
   libA (base library)
     └─→ libB (depends on libA)
          └─→ appC (depends on libB)
   ```
   
   ### 5.1 Example: libA Compilation
   
   **libA Source Structure**
   ```
   libA/
   ├── src/
   │   ├── math_utils.h         # Header file
   │   └── math_utils.cpp       # Implementation
   └── libA.bst
   ```
   
   **libA Artifact Structure** (Build Output)
   ```
   /
   ├── usr/
   │   ├── include/
   │   │   └── math_utils.h     # Installed from src/ (same digest: 
"h_aaa111...")
   │   └── lib/
   │       └── libmath_utils.a
   ```
   
   **Visual Flow:**
   
   ```mermaid
   graph TB
       subgraph Step1["① libA Source"]
           S1["src/math_utils.cpp<br/><b>digest: cpp_aaa111...</b>"]
           S2["src/math_utils.h<br/><b>digest: h_aaa111...</b>"]
       end
       
       subgraph Step2["② (Sub)Action Execution"]
           A1["<b>recc g++ -c</b><br/>Action digest: aaa111..."]
       end
       
       S1 -->|input| Step2
       S2 -->|input| Step2
       
       subgraph Step3["③ (Sub)Action Result"]
           O1["math_utils.o"]
       end
       
       Step2 -->|produces| O1
       
       subgraph Step4["④ libA Artifact (/usr/)"]
           AR1["/usr/include/math_utils.h<br/><b>digest: h_aaa111...</b>"]
           AR2["/usr/lib/libmath_utils.a"]
       end
       
       O1 -.->|archived into| AR2
       S2 -.->|installed as| AR1
       
       subgraph Step5["⑤ Generated SpeculativeAction"]
           SA["<b>base_action_digest: aaa111...</b>"]
           OV1["<b>Overlay 1:</b><br/>type: SOURCE<br/>element: libA<br/>path: 
src/math_utils.cpp<br/>target_digest: cpp_aaa111..."]
           OV2["<b>Overlay 2:</b><br/>type: SOURCE<br/>element: libA<br/>path: 
src/math_utils.h<br/>target_digest: h_aaa111..."]
       end
       
       S1 -.-|digest match| OV1
       S2 -.-|digest match| OV2
       OV1 --> SA
       OV2 --> SA
       SA --> A1
       
       style Step1 fill:#e6f3ff
       style Step4 fill:#ffe6cc
       style Step5 fill:#e6ffe6
       style Step2 fill:#fff4e1
       style Step3 fill:#f0f0f0
   ```
   
   **Recorded Subaction** (compile math_utils.cpp)
   ```protobuf
   Action {
     command_digest: "cmd_aaa111..."
     input_root_digest: "dir_aaa111..."
   }
   
   Command {
     arguments: ["g++", "-c", "-fPIC", "src/math_utils.cpp", "-o", 
"math_utils.o"]
   }
   
   Directory (dir_aaa111...) {
     files: [
       File { name: "src/math_utils.cpp", digest: "cpp_aaa111..." },
       File { name: "src/math_utils.h", digest: "h_aaa111..." }
     ]
   }
   ```
   
   **Generated SpeculativeAction**
   ```protobuf
   SpeculativeAction {
     base_action_digest: "aaa111..."
     overlays: [
       Overlay {
         type: SOURCE
         source_element: "libA"
         source_path: "src/math_utils.cpp"
         target_digest: "cpp_aaa111..."    # Replace this digest in input tree
       },
       Overlay {
         type: SOURCE
         source_element: "libA"
         source_path: "src/math_utils.h"
         target_digest: "h_aaa111..."      # Replace this digest in input tree
       }
     ]
   }
   ```
   
   ### 5.2 Example: libB Compilation
   
   **libB Source Structure**
   ```
   libB/
   ├── include/
   │   └── vector_math.h
   ├── src/
   │   └── vector_math.cpp      # Includes math_utils.h from libA
   └── libB.bst
   ```
   
   **libB Artifact Structure** (Build Output)
   ```
   /
   ├── usr/
   │   ├── include/
   │   │   └── vector_math.h
   │   └── lib/
   │       └── libvector_math.a
   ```
   
   **Visual Flow:**
   
   ```mermaid
   graph TB
       subgraph Step1A["① libB Source"]
           SB1["src/vector_math.cpp<br/><b>digest: cpp_bbb222...</b>"]
       end
       
       subgraph Step1B["① libA Artifact (dependency)"]
           AA1["/usr/include/math_utils.h<br/><b>digest: 
h_aaa111...</b><br/>(from libA artifact)"]
       end
       
       subgraph Step1C["① libA Source"]
           SA2["src/math_utils.h<br/><b>digest: h_aaa111...</b>"]
       end
       
       subgraph Step2["② (Sub)Action Execution"]
           A1["<b>recc g++ -c -I/usr/include</b><br/>Action digest: bbb222..."]
       end
       
       SB1 -->|input| Step2
       AA1 -.->|used by compile| Step2
       
       subgraph Step3["③ (Sub)Action Result"]
           O1["vector_math.o"]
       end
       
       Step2 -->|produces| O1
       
       subgraph Step4["④ Generated SpeculativeAction"]
           SA["<b>base_action_digest: bbb222...</b>"]
           OV1["<b>Overlay 1:</b><br/>type: SOURCE<br/>element: libB<br/>path: 
src/vector_math.cpp<br/>target_digest: cpp_bbb222..."]
           OV2["<b>Overlay 2:</b><br/>type: SOURCE<br/>element: libA<br/>path: 
src/math_utils.h<br/>target_digest: h_aaa111..."]
       end
       
       SB1 -.-|digest match| OV1
       SA2 -.-|"digest match<br/>(libA Source not Artifact!)"| OV2
       OV1 --> SA
       OV2 --> SA
       SA --> A1
       
       style Step1A fill:#e6f3ff
       style Step1C fill:#e6f3ff
       style Step1B fill:#ffe6cc
       style Step4 fill:#e6ffe6
       style Step2 fill:#fff4e1
       style Step3 fill:#f0f0f0
   ```
   
   **Recorded Subaction** (compile vector_math.cpp)
   ```protobuf
   Action {
     command_digest: "cmd_bbb222..."
     input_root_digest: "dir_bbb222..."
   }
   
   Command {
     arguments: [
       "g++", "-c", "-fPIC",
       "-I/usr/include",              # libA headers from dependency artifact
       "src/vector_math.cpp",
       "-o", "vector_math.o"
     ]
   }
   
   Directory (dir_bbb222...) {
     files: [
       File { name: "src/vector_math.cpp", digest: "cpp_bbb222..." },
       File { name: "/usr/include/math_utils.h", digest: "h_aaa111..." }
     ]
   }
   ```
   
   **Generated SpeculativeAction**
   ```protobuf
   SpeculativeAction {
     base_action_digest: "bbb222..."
     overlays: [
       Overlay {
         type: SOURCE
         source_element: "libB"
         source_path: "src/vector_math.cpp"
         target_digest: "cpp_bbb222..."    # Replace this digest
       },
       Overlay {
         type: SOURCE                       # Prefers SOURCE over ARTIFACT!
         source_element: "libA"             # Found in libA's source tree
         source_path: "src/math_utils.h"    # Original source path
         target_digest: "h_aaa111..."       # Same digest as in /usr/include/
       }
     ]
   }
   ```
   
   **Key Insight**: The overlay generator finds that `math_utils.h` (digest 
`h_aaa111...`) appears in libA's ARTIFACT output at 
`/usr/include/math_utils.h`, but also discovers it exists in libA's SOURCE at 
`src/math_utils.h` with the same digest. Following the SOURCE > ACTION > 
ARTIFACT priority, it records a SOURCE overlay pointing to libA's source tree. 
This creates a tighter dependency on the original source rather than the built 
artifact.
   
   ### 5.3 Example: appC Linking
   
   **appC Source Structure**
   ```
   appC/
   ├── src/main.cpp
   └── appC.bst
   ```
   
   **appC Artifact Structure** (Build Output)
   ```
   /
   ├── usr/
   │   └── bin/
   │       └── appC             # Executable
   ```
   
   **Visual Flow:**
   
   ```mermaid
   graph TB
       subgraph Step1A["① Previous Speculative Action Output"]
           PA1["main.o<br/><b>digest: obj_ccc333...</b><br/>(from compile 
action ccc333...)"]
       end
       
       subgraph Step1B["① libB Artifact"]
           AB1["/usr/lib/libvector_math.a<br/><b>digest: lib_bbb222...</b>"]
       end
       
       subgraph Step1C["① libA Artifact"]
           AA1["/usr/lib/libmath_utils.a<br/><b>digest: lib_aaa111...</b>"]
       end
       
       subgraph Step2["② (Sub)Action Execution"]
           A1["<b>g++ link</b><br/>Action digest: ccc444..."]
       end
       
       PA1 -->|input| Step2
       AB1 -->|input| Step2
       AA1 -->|input| Step2
       
       subgraph Step3["③ (Sub)Action Result"]
           O1["appC executable"]
       end
       
       Step2 -->|produces| O1
       
       subgraph Step4["④ Generated SpeculativeAction"]
           SA["<b>base_action_digest: ccc444...</b>"]
           OV1["<b>Overlay 1:</b><br/>type: ACTION<br/>element: 
appC<br/>source_action: ccc333...<br/>path: main.o<br/>target_digest: 
obj_ccc333..."]
           OV2["<b>Overlay 2:</b><br/>type: ARTIFACT<br/>element: 
libB<br/>path: /usr/lib/libvector_math.a<br/>target_digest: lib_bbb222..."]
           OV3["<b>Overlay 3:</b><br/>type: ARTIFACT<br/>element: 
libA<br/>path: /usr/lib/libmath_utils.a<br/>target_digest: lib_aaa111..."]
       end
       
       OV1 --> SA
       OV2 --> SA
       OV3 --> SA
       PA1 -.-|digest match| OV1
       AB1 -.-|digest match| OV2
       AA1 -.-|digest match| OV3
       SA --> A1
       
       style Step1A fill:#e6ffe6
       style Step1B fill:#ffe6cc
       style Step1C fill:#ffe6cc
       style Step4 fill:#e6ffe6
       style Step2 fill:#fff4e1
       style Step3 fill:#f0f0f0
   ```
   
   **Generated SpeculativeAction** (link step)
   ```protobuf
   SpeculativeAction {
     base_action_digest: "ccc444..."
     overlays: [
       Overlay {
         type: ACTION                      # From previous speculative action
         source_element: "appC"
         source_action: "ccc333..."        # The compile action
         source_path: "main.o"
         target_digest: "obj_ccc333..."
       },
       Overlay {
         type: ARTIFACT                    # From libB artifact output
         source_element: "libB"
         source_path: "/usr/lib/libvector_math.a"
         target_digest: "lib_bbb222..."
       },
       Overlay {
         type: ARTIFACT                    # From libA artifact output
         source_element: "libA"
         source_path: "/usr/lib/libmath_utils.a"
         target_digest: "lib_aaa111..."
       }
     ]
   }
   ```
   
   ### 5.4 Overlay Type Summary
   
   **Visual Overview:**
   
   ```mermaid
   graph TD
       Start[Input File Digest] --> Q1{Found in current or<br/>dependency 
element's<br/>source tree?}
       Q1 -->|YES| S[①<br/>SOURCE overlay]
       Q1 -->|NO| Q2{Found in current or<br/>dependency 
element's<br/>sub-action result?}
       Q2 -->|YES| A[②<br/>ACTION overlay]
       Q2 -->|NO| Q3{Found in dependency<br/>element's artifact?}
       Q3 -->|YES| AR[③<br/>ARTIFACT overlay]
       
       style S fill:#e6f3ff
       style A fill:#e6ffe6
       style AR fill:#ffe6cc
   ```
   
   **Overlay Type Summary**
   
   | Type | Source | Example | Use Case |
   |------|--------|---------|----------|
   | **SOURCE** | Element's source tree | `.cpp` files, `.h` headers | Tracks 
source code changes; preferred over ARTIFACT when digest matches |
   | **ARTIFACT** | Dependency element's artifact output | `.a` libraries, 
built artifacts | Tracks dependency artifacts that don't exist in source |
   | **ACTION** | Previous speculative action output | `.o` object files | 
Links compilation steps within an element |
   
   **Priority**: When the same digest appears in multiple locations, overlays 
prefer SOURCE > ACTION > ARTIFACT. This creates tighter dependencies on 
original sources rather than intermediate artifacts.
   
   ### How This Exploits Code Stability
   
   The overlay mechanism directly exploits the research findings from Section 2:
   
   1. **SOURCE overlays for .cpp files**: Will require updates when 
implementation files change (~20% of files per build)
   2. **SOURCE overlays for .h files**: Often remain valid across multiple 
builds due to header stability (Adams et al., 2016)
   3. **ARTIFACT overlays for dependency artifacts**: Very stable, as 
dependency libraries rarely change between builds
   
   Given that research shows 80% of files typically remain unchanged, we expect:
   - ~80% of SOURCE overlays to resolve successfully with unchanged digests
   - Cache hits from previous speculative executions provide the 4-5x speedup 
observed in caching studies (Matev, 2020; Misu et al., 2024)
   
   ---
   
   ## 6. System Flow
   
   ### 6.1 High-Level Process Flow
   
   ```mermaid
   flowchart TD
       Start([Build Session Start]) --> Identify[Identify Elements to Build]
       Identify --> Retrieve[Retrieve Artifacts &<br/>SpeculativeActions]
       Retrieve --> Aggregate[Aggregate & Deduplicate<br/>SpeculativeActions]
       Aggregate --> TopoSort[Topological Sort]
       
       TopoSort --> Instantiate{For Each<br/>SpeculativeAction}
       Instantiate --> Resolve{All Overlays<br/>Resolvable?}
       
       Resolve -->|No| Skip[Skip Action]
       Resolve -->|Yes| Apply[Apply Overlays]
       
       Apply --> Store[Store in CAS]
       Store --> Submit[Submit to Remote Execution]
       
       Skip --> More{More?}
       Submit --> More
       More -->|Yes| Instantiate
       More -->|No| Parallel[Speculative Execution<br/>Warms ActionCache]
       
       Parallel --> Build[Element Builds<br/>Hit Cache]
       Build --> Complete([Build Complete])
       
       style Submit fill:#fff4e1
       style Parallel fill:#fff4e1
       style Build fill:#e1f0ff
   ```
   
   ### 6.2 Data Flow Diagram
   
   ```mermaid
   flowchart TB
       subgraph BuildN["Build #N"]
           Start[Build #N Starts]
           
           subgraph Priming["Cache Priming Phase"]
               Start --> Sched[BuildStream Scheduler]
               Sched -->|query via Asset API| AC1[(Artifact Cache)]
               AC1 -->|return SpeculativeActions<br/>from Build #N-1| 
SAList[SpeculativeActions]
               
               SAList --> Inst[Instantiator]
               
               Inst --> Over{Overlay<br/>Type?}
               Over -->|SOURCE| Src[Source Tree]
               Over -->|ARTIFACT| Art[Artifact Output]
               Over -->|ACTION| Act[Action Output]
               
               Src --> New[New Action]
               Art --> New
               Act --> New
               
               CASD1[buildbox-casd<br/>local CAS proxy]
               New -->|store via CAS API| CASD1
               CASD1 -->|proxy to| CAS1[(BuildGrid CAS)]
               
               New -->|Execute API| BuildGrid1[BuildGrid<br/>Execution Service]
               BuildGrid1 -->|check| ACache[ActionCache]
               ACache -->|miss| BuildGrid1
               BuildGrid1 -->|dispatch via Bots API| Workers1[buildbox-worker]
               Workers1 -->|fetch via| CASD_W1[buildbox-casd<br/>on worker]
               CASD_W1 -->|proxy to| CAS1
               Workers1 -->|execute| Runner1[buildbox-run]
               Runner1 -->|result| Workers1
               Workers1 -->|store result| ACache
           end
           
           subgraph Execution["Element Builds"]
               Sched -->|trigger| BSElement[BuildStream<br/>Element Build]
               BSElement -->|create Action| CASD2[buildbox-casd<br/>local CAS 
proxy]
               CASD2 -->|store via CAS API| CAS1
               BSElement -->|Execute API| BuildGrid2[BuildGrid<br/>Execution 
Service]
               BuildGrid2 -->|check| ACache
               ACache -.->|cache hit for sub-actions!| BuildGrid2
               BuildGrid2 -->|dispatch via Bots API| Workers2[buildbox-worker]
               Workers2 -->|fetch via| CASD_W2[buildbox-casd<br/>on worker]
               CASD_W2 -->|proxy to| CAS1
               Workers2 -->|execute| Runner2[buildbox-run]
               Runner2 -->|spawns| SubActions[Sub-Actions:<br/>recc invocations]
               SubActions -.->|benefit from warmed ActionCache| ACache
               SubActions -->|record digests| Runner2
               Runner2 -->|result + subactions| Workers2
               Workers2 -->|ActionResult| BuildGrid2
               BuildGrid2 -->|result| BSElement
               BSElement -->|store via| CASD2
           end
           
           subgraph Generation["Overlay Generation (Async)"]
               Gen[Overlay Generator<br/>ASYNC]
               BSElement -.->|triggers async| Gen
               Gen -->|fetch Actions via CAS API| CASD2
               Gen -->|create| SA[SpeculativeActions]
               SA -->|attach to| Art1[Artifact]
               Art1 -->|store via Asset API| AC2[(Artifact Cache)]
           end
       end
       
       AC2 -.->|↓ Data for Build #N+1 ↓| Boundary
       
       
Boundary[/"═══════════════════════════════════════════════════════════════════"\]
       
       subgraph BuildN1["Build #N+1 (Uses SpeculativeActions from Build #N)"]
           NextBuild[Build #N+1 will use<br/>SpeculativeActions<br/>generated 
above]
       end
       
       style Start fill:#e6f3ff
       style SAList fill:#ffeb99
       style Gen fill:#ffe6cc
       style SA fill:#ffe6cc
       style Art1 fill:#ffe6cc
       style AC2 fill:#ffe6cc
       style Workers1 fill:#fff4e1
       style Workers2 fill:#fff4e1
       style BSElement fill:#e1f0ff
       style ACache fill:#fff9e1
       style Boundary 
fill:#f0f0f0,stroke:#333,stroke-width:3px,stroke-dasharray: 5 5
       style BuildN1 fill:#f8f8ff
       style NextBuild fill:#ffeb99
       style CASD1 fill:#e8f4f8
       style CASD2 fill:#e8f4f8
       style CASD_W1 fill:#e8f4f8
       style CASD_W2 fill:#e8f4f8
   ```
   
   **Reading the Diagram:**
   
   This is an example setup with BuildGrid for Remote Execution services.
   
   *Note: this diagram could do with some additional cleanup and scrutiny*
   
   **REAPI Architecture**:
   - BuildStream runs with local `buildbox-casd` as CAS proxy
   - `buildbox-casd` proxies to BuildGrid's CAS, ActionCache, and Execution 
services
   - Workers also run `buildbox-casd` locally for efficient blob access
   - All communication uses REAPI (Execute, CAS, Asset, Bots APIs)
   
   **Cache Priming Flow**:
   1. **Priming Phase**: Retrieves SpeculativeActions → instantiates → submits 
via Execute API → BuildGrid dispatches to workers → results stored in 
ActionCache
   2. **Element Builds**: BuildStream creates Actions → Execute API → BuildGrid 
checks ActionCache → dispatches to workers → `buildbox-run` spawns sub-actions 
(recc) → sub-actions benefit from warmed ActionCache
   3. **Overlay Generation**: Asynchronously fetches sub-action digests and 
generates SpeculativeActions for Build #N+1
   
   **Key Flow**: Build #N uses speculative actions from Build #N-1 → builds → 
generates speculative actions for Build #N+1
   
   ---
   
   ## 7. Detailed Algorithms
   
   **Note**: The following pseudocode is illustrative and non-optimal. 
Production implementations should optimize for performance, especially around 
CAS operations and dependency resolution.
   
   ### 7.1 Overlay Generation (after build)
   
   **Note**: This process runs asynchronously and does not block the element 
build completion or downstream element builds.
   
   ```python
   # Triggered asynchronously after element build completes
   async def generate_overlays(element, element_subactions):
       speculative_actions = []
   
       for subaction_digest in element_subactions:
           subaction = fetch_cas(subaction_digest)
           overlays = []
           
           for input_file in subaction.input_root.files:
               # Match digest to source: SOURCE > ACTION > ARTIFACT priority
               # This means if digest "h_aaa111..." appears in:
               #   - libA's source at "src/math_utils.h" (SOURCE)
               #   - libA's artifact at "/usr/include/math_utils.h" (ARTIFACT)
               # We prefer the SOURCE match
               
               src_element, src_path, src_type = 
find_source_element(input_file.digest)
               
               overlay = Overlay(
                   type=src_type,
                   source_element=src_element,
                   source_path=src_path,
                   target_digest=input_file.digest  # The digest we'll replace 
during instantiation
               )
               overlays.append(overlay)
           
           speculative_actions.append(SpeculativeAction(
               base_action_digest=subaction_digest,
               overlays=overlays
           ))
   
       # Attach to Artifact (update artifact in cache)
       artifact.speculative_actions = 
store_cas(SpeculativeActions(actions=speculative_actions))
       update_artifact_cache(element, artifact)
       
       # Element's downstream dependencies can already be building
       # This overlay data will be available for the NEXT build session
   ```
   
   **Note on find_source_element**: This function searches through all elements 
in the dependency graph that were built before the current element. It looks 
for files matching the target digest in:
   1. SOURCE trees (element source files)
   2. ACTION outputs (from other speculative actions, including those from 
earlier elements in the dependency graph)
   3. ARTIFACT outputs (built artifacts from dependencies)
   
   When the same digest appears in multiple locations, it returns the SOURCE 
match, creating a direct dependency on the original source rather than 
intermediate artifacts.
   
   **Use Cases for ACTION Overlays Beyond Current Element**:
   ACTION overlays are particularly valuable for intermediate build artifacts 
that are generated deterministically:
   - **Code generation**: protoc, flex/bison output files can be referenced 
across elements
   - **Resource compilation**: glib-compile-resources, Qt rcc generated code
   - **Archive creation**: `.o` → `.a` via `rear` (see Section 12)
   - **Intermediate transformations**: any deterministic build step producing 
reusable artifacts
   
   ### 7.2 Overlay Instantiation (before build)
   
   ```python
   speculative_action_map = {}  # base_digest → speculative_action_digest
   
   for action_template in topological_sort(all_speculative_actions):
       base_action = fetch_cas(action_template.base_action_digest)
       action = copy.deepcopy(base_action)
       
       # Apply each overlay
       all_resolved = True
       for overlay in action_template.overlays:
           if overlay.type == SOURCE:
               src_digest = resolve_source_digest(overlay.source_element, 
overlay.source_path)
           elif overlay.type == ARTIFACT:
               artifact_output = 
wait_for_artifact_output(overlay.source_element)
               src_digest = digest_at_path(artifact_output, overlay.source_path)
           elif overlay.type == ACTION:
               speculative_action = 
speculative_action_map[overlay.source_action]
               wait_for_action(speculative_action)
               output = fetch_action_output(speculative_action)
               src_digest = digest_at_path(output, overlay.source_path)
           
           if src_digest is None:
               all_resolved = False
               break
           
           replace_input_digest(action.input_root, overlay.target_digest, 
src_digest)
       
       if all_resolved:
           speculative_action_digest = store_cas(action)
           speculative_action_map[action_template.base_action_digest] = 
speculative_action_digest
           submit_speculative_action(speculative_action_digest)
   ```
   
   ---
   
   ## 8. Implementation Plan
   
   ### Phase 1: REAPI Extension
   1. Add `ActionResult.subactions` field to REAPI
   2. Instrument buildbox-worker to record subaction digests
   3. Update buildbox-casd to store subactions with ActionResult
   
   ### Phase 2: BuildStream Integration - SOURCE Overlays Only
   1. Implement overlay generator in BuildStream (SOURCE type only)
   2. Add SpeculativeActions to Artifact protobuf
   3. Store/retrieve SpeculativeActions via Artifact Cache
   4. Implement speculative action instantiation for SOURCE overlays
   5. Add speculative action submission to scheduler
   6. **Make overlay generation asynchronous**: Run as background task, don't 
block element builds or downstream dependencies
   7. **Measure**: Overlay resolution success rate, storage overhead, 
performance impact, overlay generation time
   
   ### Phase 3: ARTIFACT Overlays
   1. Extend overlay generator to support ARTIFACT type
   2. Update instantiation logic to resolve ARTIFACT overlays
   3. **Measure**: Additional performance gains, complexity vs. benefit
   
   ### Phase 4: ACTION Overlays (potentially combined with `rear` and other re- 
wrappers)
   1. Implement ACTION overlay support in generator and instantiator
   2. Evaluate whether ACTION overlays provide value for code generation 
(protoc, flex/bison) and other deterministic intermediate steps
   3. If beneficial for archive creation, implement `rear` (remote execution 
ar) to complete the speculative action chain (see Section 12 for details)
   4. Consider other re-wrapper candidates: ranlib, strip, resource compilers 
(see Section 12)
   5. **Measure**: End-to-end performance improvement
   
   ### Phase 5: Optimization
   1. Optimize digest resolution using Apache Arrow for columnar digest storage 
and vectorized lookups
   2. Skip priming for elements with strong cache key hits and their transitive 
dependencies up the dependency graph (unchanged subtrees)
   3. Optimize topological sorting and deduplication
   4. Implement adaptive throttling for speculative action submission
   5. Add comprehensive monitoring and metrics
   
   ---
   
   ## 9. Benefits
   
   - **Restores Parallelism**: Compilation units execute in parallel across the 
entire build graph
   - **Transparent**: No changes to element definitions or build scripts
   - **Safe**: Incorrect/stale overlays simply become redundant; correctness 
never affected
   - **Efficient**: Minimal storage overhead (overlays are small metadata)
   - **Incremental**: Works with partial cache hits and source changes
   - **Extensible**: Can be enhanced with local execution optimization (`rear`) 
and other re-wrapped commands to further improve performance
   
   ---
   
   ## 10. Key Design Decisions
   
   1. **Store speculative actions with artifacts**: Keeps priming data 
co-located with build results
   2. **Three speculative action overlay types**: SOURCE/ARTIFACT/ACTION cover 
all input sources with appropriate granularity
   3. **Topological sorting**: Ensures ACTION overlays can resolve dependencies
   4. **Skip on unresolved**: Gracefully handles missing/changed inputs without 
blocking
   5. **Early submission**: Submit speculative actions as soon as dependencies 
resolve, regardless of element depth in graph
   6. **Asynchronous overlay generation**: Overlay generation runs in the 
background and does not block element build completion or downstream element 
builds. SpeculativeActions become available for future builds without impacting 
the current build's critical path.
   
   ---
   
   ## 11. Open Questions and Key Uncertainties
   
   ### Implementation Questions
   1. **Resource limits**: How many concurrent speculative actions should we 
allow?
   2. **Metrics**: What telemetry do we need to measure priming effectiveness?
   
   ### Critical Success Factors
   
   #### 1. Phased Rollout Strategy
   **Question**: Should we implement all overlay types at once, or phase them?
   
   **Proposal**: Start with SOURCE overlays only (Phase 2), measure impact, 
then add ARTIFACT (Phase 3), then ACTION (Phase 4).
   
   **Rationale**:
   - SOURCE overlays are simplest and likely provide the most benefit
   - ARTIFACT overlays add cross-element dependencies but may have lower 
resolution rates
   - ACTION overlays require careful dependency management and may only be 
valuable combined with `rear`
   
   **Required**: Define success criteria for each phase before proceeding to 
the next.
   
   #### 2. Overlay Resolution Success Rate
   **Question**: How often will overlays successfully resolve in practice?
   
   **Context**: Speculative actions are skipped if any overlay cannot be 
resolved. Success rate depends on:
   - How frequently source files change between builds
   - Whether the same digests appear in both SOURCE and ARTIFACT locations
   - For ACTION overlays: whether dependency chains remain intact
   
   **Research Evidence**: Studies of C/C++ build caching and code churn 
patterns provide encouraging data:
   
   - **Adams et al. (2016)**: Most C/C++ header files are relatively stable 
compared to implementation files
   - **Misu et al. (2024)**: Incremental builds with caching achieve 4.22x to 
4.71x speedups, demonstrating high cache hit rates in practice
   - **Matev (2020)**: Hot ccache provides 4-5x speedup over cold cache in 
large C++ projects
   - **Chowdhury et al. (2024)**: Approximately 80% of code changes concentrate 
in 20% of methods (Pareto principle)
   - **Shin et al. (2011)**: In Mozilla and Red Hat Linux, 89.1% of files were 
in the low-churn, stable category
   - **Macho et al. (2021)**: Analysis of 144 Maven projects and build changes 
found that while dependency version updates are among the top-10 most frequent 
build changes, structural changes to build configuration (adding/removing 
dependencies, changing build order) occur much less frequently. Build 
maintenance is primarily driven by accommodating changes to production code 
rather than restructuring the build itself.
   - **McIntosh et al. (2012)**: Study of Java build systems showed that build 
complexity evolves slowly over time, with build structure remaining relatively 
stable between major refactorings.
   
   **Implication for Build Structure Stability**: Research shows that while 
source code changes frequently, build structure (which libraries are linked, 
link order, build rules) changes much less often:
   - ~80% of SOURCE overlays for .cpp files to resolve successfully (unchanged 
sources)
   - >90% of SOURCE overlays for .h files to resolve (headers more stable per 
Adams)
   - >95% of ARTIFACT overlays to resolve (dependency artifacts rarely change)
   - ~85-90% of ACTION overlays to resolve (link lines and build structure 
change infrequently per Macho/McIntosh)
   
   **Real-world cache effectiveness**: The Misu and Matev studies showing 4-5x 
speedups from caching suggest that cache hit rates of 75-80% are achievable in 
practice, directly supporting the viability of Cache Priming.
   
   **Required**:
   - Instrument prototype to measure actual resolution rates on real projects
   - Identify common failure patterns
   - Validate against freedesktop-sdk and GNOME builds
   - Determine if partial resolution strategies would help
   
   **Risk**: If <50% of speculative actions resolve successfully, the system 
may not provide sufficient benefit to justify the complexity. However, research 
suggests success rates should be substantially higher (70-80%+).
   
   #### 3. Non-Determinism Handling
   **Question**: How do non-deterministic builds affect speculative action 
correctness?
   
   **Context**: If a build produces different outputs for the same inputs, the 
recorded `base_action_digest` may not be reusable.
   
   **Current Assumption**: Non-deterministic actions will still return cached 
results when the ActionCache entry hasn't been evicted. If a new execution 
produces different outputs, it generates a new subaction, and 
SpeculativeActions are updated for the next build.
   
   **Required**: Validate this assumption and document expected behavior for 
non-deterministic builds.
   
   #### 4. Digest Resolution Performance Impact
   **Question**: Can we make `find_source_element` fast enough to avoid 
becoming a bottleneck?
   
   **Context**: During overlay generation, every input file of every subaction 
needs its digest resolved against all source trees, artifact outputs, and prior 
actions in the dependency graph. For large builds, this could be thousands of 
files × thousands of potential sources.
   
   **Critical Insight**: Since overlay generation runs asynchronously and 
doesn't block element builds or speculative action execution, performance 
requirements are relaxed:
   - Overlay generation for element N can run while elements N+1, N+2, etc. are 
building
   - Generated SpeculativeActions are used in the *next* build session, not the 
current one
   - Slow overlay generation only delays availability of speculative actions 
for future builds
   
   **Optimization Strategies**:
   - **Apache Arrow for digest lookups**: Use Apache Arrow's columnar format to 
store digest tables with built-in vectorized operations
     - Arrow provides O(1) random access with cache-efficient columnar layout
     - Built-in SIMD optimizations for hash lookups and comparisons
     - Cross-language support (C++, Python, Rust) for BuildStream integration
     - Set lookup kernels with hash table support already implemented
     - No need to implement custom indices, bloom filters, or vectorization
   - **Columnar digest storage**: Store digests in Arrow tables: `[(digest: 
FixedSizeBinary(32), element: string, path: string, type: int8)]`
   - **Arrow compute kernels**: Use `pc.is_in()` for individual digest checks 
and `pc.join(..., join_type="semi")` for bulk digest matching against large sets
   - **Memory efficiency**: Arrow's columnar format provides 64-byte alignment 
for optimal SIMD performance
   
   **Why Apache Arrow**:
   - Arrow's columnar layout enables vectorization using SIMD operations, with 
64-byte alignment matching AVX-512 register width
   - Arrow provides built-in set lookup kernels with hash table support for 
exactly this use case
   - Arrow's pipeline and SIMD algorithms deliver 10-100x performance 
improvements through cache locality and parallel operations
   - Arrow enables efficient data processing without serialization overhead, 
making cross-module data sharing trivial
   - Proven at scale: PySpark achieved 10-100x performance improvements using 
Arrow
   
   **Required**: 
   - Profile and optimize on real-world projects (freedesktop-sdk, GNOME)
   - Measure: time to generate overlays vs. element build time
   - Implement Apache Arrow-based digest lookup tables as primary strategy
   - Leverage Arrow's built-in set lookup kernels and hash table support
   
   **Acceptable Performance**: Overlay generation should complete within 1-2x 
the element's build time. Since it runs asynchronously, it won't impact the 
current build's critical path.
   
   **Risk**: If overlay generation is extremely slow (>10x element build time), 
it might not complete before the next build session starts, reducing cache 
priming effectiveness.
   
   #### 5. Speculative Action Scale and Remote Execution Capacity
   
   **Question**: Won't submitting thousands of speculative actions overwhelm 
the remote execution cluster?
   
   **Context**: A large C/C++ BuildStream project might generate thousands of 
speculative actions (one per compilation unit). For example, a project with 100 
elements averaging 100 compilation units each would produce 10,000 speculative 
actions.
   
   **Comparison to Existing Systems**: This scale is actually modest compared 
to build systems already using Remote Execution:
   
   - Bazel builds of TensorFlow generate over 8,191 actions executing across 
hundreds of remote workers
   - Google runs millions of builds executing millions of test cases every day, 
with builds depending on tens of thousands of targets
   - A large TypeScript project with 10M lines of code generates 1,100 Bazel 
targets, with benchmarks using 100 remote executors
   - Bazel builds often include thousands or tens of thousands of small actions 
such as compiling, linking, or running tests
   
   **Key Insight**: Remote Execution systems like BuildGrid are designed to 
handle tens of thousands of concurrent actions. Cache Priming's speculative 
actions are within the normal operating range of these systems.
   
   **Mitigation Strategies**:
   - **ActionCache hits**: Many speculative actions will hit immediately, 
reducing actual execution load
   - **Priority scheduling**: Actual element builds can be prioritized over 
speculative actions
   - **Adaptive throttling**: Submit speculative actions at a controlled rate 
based on cluster capacity
   - **Gradual rollout**: Start with Phase 2 (SOURCE overlays only) which 
generates fewer actions
   
   **Required**:
   - Monitor remote execution cluster utilization during prototype testing
   - Implement priority queuing: actual builds > speculative actions
   - Consider time-windowing: only prime for elements building soon
   
   **Acceptable**: Speculative actions should consume <30% of remote execution 
capacity, leaving headroom for actual builds.
   
   #### 6. Storage and Network Overhead
   **Question**: What is the actual storage overhead of SpeculativeActions?
   
   **Context**: 
   - Each artifact now includes SpeculativeActions metadata
   - For compilation-heavy elements (1000+ compilation units), this could be 
significant
   - ReferencedSpeculativeAction stores only element name + digest (minimal 
duplication)
   
   **Required**:
   - Measure actual storage increase on real projects
   - Consider compression strategies if overhead is significant
   - Provide opt-out mechanism for elements that don't benefit
   
   #### 7. Value of ACTION Overlays Without rear
   **Question**: Do ACTION overlays provide benefit on their own, or only when 
combined with `rear`?
   
   **Context**: ACTION overlays create dependencies between speculative actions 
within an element. Without `rear` (see Section 12), the primary use case would 
be linking, which requires archives that may not exist as separate subactions.
   
   **Proposal**: Evaluate ACTION overlays and `rear` together in Phase 4, as 
they may be interdependent.
   
   **Required**: Identify other potential uses for ACTION overlays beyond 
archive creation.
   
   ### Success Metrics
   
   To validate the proposal, we need to measure:
   
   1. **Performance**: End-to-end build time reduction (target: >20% for C/C++ 
heavy projects)
   2. **Resolution Rate**: Percentage of speculative actions successfully 
instantiated (target: >70%)
   3. **Storage Overhead**: Size increase of artifact cache (acceptable: <20%)
   4. **Overlay Generation Time**: Time to generate overlays per element 
(acceptable: <2x element build time, since it's async)
   5. **Worker Utilization**: Speculative actions should not starve actual 
builds
   
   ### Validation Plan
   
   1. **Implement Phase 1-2**: SOURCE overlays only on a prototype
   2. **Test on freedesktop-sdk**: Large, real-world C/C++ heavy BuildStream 
project
   3. **Measure all success metrics**: Gather data before proceeding
   4. **Iterate or pivot**: If metrics don't meet targets, adjust design or 
abandon
   5. **Proceed to Phase 3+**: Only if Phase 2 shows clear benefit
   
   ---
   
   ## 12. Future Enhancement: Local Execution with rear
   
   ### Motivation
   
   In the base Cache Priming design, speculative actions follow this pattern:
   1. Compile `.cpp` → `.o` (via `recc`, speculative)
   2. Upload `.o` to CAS
   3. Download `.o` files for archiving
   4. Create `.a` archive
   5. Upload `.a` to CAS
   
   However, since BuildStream submits entire element builds as Actions to 
workers, and `recc` returns `.o` files during the element build, those `.o` 
files are **already local on the worker**. We can optimize archive creation by 
keeping it local.
   
   ### The `rear` Command
   
   Similar to `recc` (remote execution caching compiler), **rear** (remote 
execution ar) wraps the `ar` archiver:
   
   ```bash
   # Traditional ar usage
   ar rcs libmath_utils.a math_utils.o helpers.o
   
   # With rear wrapper
   rear rcs libmath_utils.a math_utils.o helpers.o
   ```
   
   **rear behavior**:
   1. Generates an Action describing the archive operation
   2. Checks ActionCache for existing result
   3. If miss:
      - **During main element build**: Executes `ar` **locally on the worker** 
(via buildbox-casd) since `.o` files are already present
      - **During speculative execution**: Executes remotely (fetches `.o` files 
from CAS as needed)
   4. Stores result in ActionCache
   5. Records the Action digest in `ActionResult.subactions`
   
   ### Extended Speculative Action Chain
   
   With `rear`, we get a complete chain of speculative actions:
   
   ```
   libA element build (main build):
     recc g++ -c src/math_utils.cpp -o math_utils.o    [remote or cached]
       ↓ (local .o file on worker)
     rear rcs /usr/lib/libmath_utils.a math_utils.o    [LOCAL execution]
       ↓ (local .a file on worker)
     g++ app.o -L/usr/lib -lmath_utils -o final_binary [remote or cached]
   
   Speculative execution (cache priming):
     recc compile action                                [remote, warms 
ActionCache]
       ↓ (.o uploaded to CAS)
     rear archive action                                [remote, fetches .o 
from CAS]
       ↓ (.a uploaded to CAS)
     link action                                        [remote, fetches .a 
from CAS]
   ```
   
   ### Example: libA with `rear`
   
   **Recorded Subactions** during libA build:
   ```protobuf
   ActionResult {
     subactions: [
       "recc_compile_math_utils...",   // .cpp → .o
       "recc_compile_helpers...",      // .cpp → .o
       "rear_archive..."               // .o → .a
     ]
   }
   ```
   
   **Generated SpeculativeAction for rear**:
   ```protobuf
   SpeculativeAction {
     base_action_digest: "rear_archive..."
     overlays: [
       Overlay {
         type: ACTION
         source_element: "libA.bst"
         source_action: "recc_compile_math_utils..."
         source_path: "math_utils.o"
         target_digest: "obj_math_utils..."
       },
       Overlay {
         type: ACTION
         source_element: "libA.bst"
         source_action: "recc_compile_helpers..."
         source_path: "helpers.o"
         target_digest: "obj_helpers..."
       }
     ]
   }
   ```
   
   ### Key Advantages
   
   1. **Faster Main Builds**: During the actual element build, `.o` files are 
local on the worker, so archive creation executes immediately without network 
overhead
   2. **Earlier Link Actions**: Even though speculative `rear` actions run 
remotely, they still populate the ActionCache, enabling link actions to start 
as soon as archives are cached
   3. **ActionCache Hits**: When the main build runs, `rear` checks ActionCache 
first and gets an immediate hit from the speculative execution
   4. **Reduced Critical Path**: The main build's archive step becomes nearly 
instantaneous (cache hit), shortening the overall build time
   
   **Note**: Future optimizations (out of scope for this proposal) could 
include locality hints to prefer scheduling speculative `rear` actions on 
workers that already have the `.o` files, further reducing network traffic.
   
   ### Execution Flow
   
   ```
   Cache Priming Phase (speculative):
     Submit compile actions (recc) → executes remotely, warms ActionCache
       ↓ [.o files uploaded to CAS]
     Submit archive action (rear) → executes remotely, fetches .o from CAS
       ↓ [.a file uploaded to CAS, ActionCache warmed]
     Submit link actions → executes remotely, warms ActionCache
       ↓
   
   Main Build Phase:
     Element build starts
       ↓
     recc compile → ActionCache HIT (instant)
       ↓ [.o files now local on worker]
     rear archive → ActionCache HIT, but if miss, executes locally (instant)
       ↓ [.a file now local on worker]
     link → ActionCache HIT (instant)
       ↓
     Element build completes in minimal time
   ```
   
   The key insight: **rear enables ActionCache hits during the main build, and 
when those hits miss, local execution is very fast because the input `.o` files 
are already on the worker**.
   
   ### Beyond `rear`: Instrumentation Strategy
   
   The `rear` optimization illustrates a general principle: **identify commands 
that operate on local intermediate data and wrap them for speculative 
execution**.
   
   **Candidates for "re-" wrappers**:
   - `ranlib`: Index generation for static libraries (fast, local)
   - `strip`: Symbol stripping (fast, local, self-contained)
   - Code generators: protoc, flex/bison (deterministic, cacheable)
   - Resource compilers: glib-compile-resources, Qt rcc
   
   **Instrumentation approach**:
   1. Monitor build logs to identify frequent intermediate commands
   2. Measure: execution time, input/output sizes, determinism
   3. Prioritize commands that are:
      - Fast enough for local execution (< 1 second)
      - Self-contained (few dependencies)
      - Operate on data already local to worker
      - Appear frequently across builds
   4. Implement re-wrappers and measure impact
   
   This creates a virtuous cycle: more instrumentation → more optimization 
opportunities → better build performance.
   
   ---
   
   ## References
   
   - Adams, B., McIntosh, S., Nagappan, M., and Hassan, A.E. (2016). 
"Identifying and understanding header file hotspots in C/C++ build processes." 
*Automated Software Engineering*, Vol. 23, pp. 61-90. 
https://dl.acm.org/doi/10.1007/s10515-015-0183-5
   
   - Chowdhury, S., Uddin, G., Hemmati, H., and Holmes, R. (2024). "The Good, 
the Bad, and the Monstrous: Predicting Highly Change-Prone Source Code Methods 
at Their Inception." *ACM Transactions on Software Engineering and 
Methodology*, Vol. 33, No. 4. https://dl.acm.org/doi/10.1145/3715006
   
   - Macho, C., McIntosh, S., and Pinzger, M. (2021). "The nature of build 
changes." *Empirical Software Engineering*, Vol. 26, Article 52. 
https://link.springer.com/article/10.1007/s10664-020-09926-4
   
   - Matev, R. (2020). "Fast distributed compilation and testing of large C++ 
projects." *EPJ Web of Conferences*, Vol. 245. 
https://www.epj-conferences.org/articles/epjconf/pdf/2020/21/epjconf_chep2020_05001.pdf
   
   - McIntosh, S., Adams, B., and Hassan, A.E. (2012). "The Evolution of Java 
Build Systems." *Empirical Software Engineering*, Vol. 17, No. 4-5, pp. 578-608.
   
   - Misu, S., Lin, B., and Monden, A. (2024). "Does Using Bazel Help Speed Up 
Continuous Integration Builds?" *arXiv preprint*. 
https://arxiv.org/html/2405.00796v1
   
   - Nagappan, N. and Ball, T. (2007). "Using Software Dependencies and Churn 
Metrics to Predict Field Failures: An Empirical Case Study." *First 
International Symposium on Empirical Software Engineering and Measurement (ESEM 
2007)*. https://ieeexplore.ieee.org/document/4343764/
   
   - Shin, Y., Meneely, A., Williams, L., and Osborne, J. (2011). "Evaluating 
Complexity, Code Churn, and Developer Activity Metrics as Indicators of 
Software Vulnerabilities." *IEEE Transactions on Software Engineering*, Vol. 
37, No. 6, pp. 772-787. https://ieeexplore.ieee.org/document/5560680/
   
   - Munson, J.C. and Elbaum, S. (1998). "Code Churn: A Measure for Estimating 
the Impact of Code Change." *Proceedings of the International Conference on 
Software Maintenance*, pp. 24-31. https://ieeexplore.ieee.org/document/738486/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Speculative Actions: Predictive Cache Priming for BuildStream [buildstream]

Reply via email to