sstriker opened a new issue, #2083:
URL: https://github.com/apache/buildstream/issues/2083
# Speculative Actions: Predictive Cache Priming for BuildStream
## Table of Contents
1. [Executive Summary](#1-executive-summary)
2. [Problem Statement](#2-problem-statement)
3. [Solution Overview](#3-solution-overview)
4. [Data Model](#4-data-model)
- [4.1 Protocol Buffer Extensions](#41-protocol-buffer-extensions)
5. [Concrete Example: libA → libB →
appC](#5-concrete-example-liba--libb--appc)
- [5.1 Example: libA Compilation](#51-example-liba-compilation)
- [5.2 Example: libB Compilation](#52-example-libb-compilation)
- [5.3 Example: appC Linking](#53-example-appc-linking)
- [5.4 Overlay Type Summary](#54-overlay-type-summary)
6. [System Flow](#6-system-flow)
- [6.1 High-Level Process Flow](#61-high-level-process-flow)
- [6.2 Data Flow Diagram](#62-data-flow-diagram)
7. [Detailed Algorithms](#7-detailed-algorithms)
- [7.1 Overlay Generation (after
build)](#71-overlay-generation-after-build)
- [7.2 Overlay Instantiation (before
build)](#72-overlay-instantiation-before-build)
8. [Implementation Plan](#8-implementation-plan)
9. [Benefits](#9-benefits)
10. [Key Design Decisions](#10-key-design-decisions)
11. [Open Questions and Key
Uncertainties](#11-open-questions-and-key-uncertainties)
- [11.1 Phased Rollout Strategy](#1-phased-rollout-strategy)
- [11.2 Overlay Resolution Success
Rate](#2-overlay-resolution-success-rate)
- [11.3 Non-Determinism Handling](#3-non-determinism-handling)
- [11.4 Digest Resolution Performance
Impact](#4-digest-resolution-performance-impact)
- [11.5 Speculative Action Scale and Remote Execution
Capacity](#5-speculative-action-scale-and-remote-execution-capacity)
- [11.6 Storage and Network Overhead](#6-storage-and-network-overhead)
- [11.7 Value of ACTION Overlays Without
`rear`](#7-value-of-action-overlays-without-rear)
12. [Future Enhancement: Local Execution with
`rear`](#12-future-enhancement-local-execution-with-rear)
13. [References](#references)
---
## 1. Executive Summary
Cache Priming restores build parallelism in BuildStream by speculatively
executing compiler invocations ahead of time, warming the Remote Execution
ActionCache before actual element builds run. This dramatically reduces
end-to-end build latency without compromising correctness.
**Key Insight**: After building an element once, we know all the subactions
(e.g. `recc` compiler invocations) that occurred. We can store these with
overlay instructions, then reuse them in future builds by adapting inputs for
source/dependency changes.
**BuildStream and Remote Execution**: BuildStream is built on Remote
Execution API (REAPI) paradigms, using a local buildbox-casd as a CAS proxy
that connects to remote Execution, ActionCache, and CAS services. Cache Priming
extends this architecture by adding speculative action submission to warm the
ActionCache before element builds execute.
## 2. Problem Statement
BuildStream orchestrates repository-level builds across many elements,
allowing each element to use its native builds system, but not achieving the
fine-grained parallelism available within individual compilation units. C/C++
projects are particularly affected; translation units could compile in parallel
but instead wait sequentially for their element's turn in the build graph.
### Why Cache Priming is Particularly Effective
**Code churn patterns favor cache reuse**: Research consistently shows that
code changes are concentrated in a small subset of files, with most code
remaining stable between builds. This stability pattern makes speculative
execution with cached results highly effective.
**Key Research Findings:**
1. **Adams et al. (2016)** conducted a comprehensive case study on GLib,
PostgreSQL, Qt, and Ruby systems analyzing C/C++ header file "hotspots"—headers
that change frequently and trigger expensive rebuild processes. They
empirically demonstrated that most header files are relatively stable compared
to implementation files, with only a small subset accounting for the majority
of build-time impact.
2. **Misu et al. (2024)** analyzed 383 GitHub projects using Bazel and found
that incremental builds with caching achieve median speedups of 4.22x (build
system tool-independent cache) to 4.71x (build system tool-specific cache) for
long-build duration projects. This demonstrates the substantial real-world
value of build caching when most inputs haven't changed.
3. **Matev (2020)** studied large C++ projects using ccache for distributed
compilation, demonstrating that with a hot cache, compilation can be 4-5 times
faster than with cache misses. The study showed that "incremental builds often
do not significantly speed up compilation because even just a change of the
modification time of a widely used header leads to many compiler and linker
invocations"—highlighting both the cost of header changes and the benefit when
they remain stable.
4. **Chowdhury et al. (2024)** analyzed 774,051 source code methods from 49
prominent open-source Java projects and found that "approximately 80% of
changes are concentrated in just 20% of the methods, demonstrating the Pareto
80/20 principle. Moreover, this subset of methods is responsible for the
majority of the identified bugs in these projects." This concentration of
changes means the vast majority of code remains unchanged between builds.
**Supporting Evidence on Change Concentration:**
- **Shin et al. (2011)**: In Mozilla Firefox and Red Hat Enterprise Linux,
70.8% of known vulnerabilities were found in only 10.9% of files, demonstrating
extreme concentration of churn in a small subset of the codebase.
- **Nagappan and Ball (2007)**: In Windows Server 2003, software
dependencies and churn measures were efficient predictors of post-release
failures, further confirming that changes cluster in predictable locations.
- **Munson and Elbaum (1998)**: Established that code churn, the amount of
code change over time, is measurable and correlates with fault-proneness,
confirming that change patterns are non-uniform and predictable.
In C/C++ projects specifically:
- **Implementation files (.cpp) churn more frequently** than header files
(.h)
- **Header files are more stable** and change less often (Adams et al., 2016)
- **Header file changes are expensive**: Even just changing the modification
time of a widely-used header triggers many recompilations (Matev, 2020)
- **Build caching is highly effective**: Studies show 4-5x speedups with
warm caches (Matev, 2020; Misu et al., 2024)
- Changes tend to cluster in specific modules while the broader codebase
remains stable
**Example**: In a typical incremental build of a C++ project with 1000
compilation units:
- ~200 files changed (20% following Pareto)
- ~800 files unchanged (80%)
- With effective build caching, the 800 unchanged compilations can be served
from cache
- Only the 200 changed files need actual compilation
- **Result**: Build time dominated by the 200 changed files, achieving 4-5x
speedup
This is why Cache Priming is particularly powerful for C/C++ codebases: the
stability of header files and the concentration of changes in a minority of
implementation files means that speculative compilation actions can frequently
reuse cached results from previous builds.
---
## 3. Solution Overview
**High-Level Flow:**
```mermaid
flowchart LR
Build["Build #N"] -->|instantiates| SpecActions["Speculative
Actions<br/>(generated by Build #N-1)"]
SpecActions -->|warm| ActionCache[(ActionCache)]
ActionCache -->|serves| ElementBuilds[Element Builds]
ElementBuilds -->|produce| SubActions["(Sub)Actions"]
SubActions -->|transformed into| SpecActions2[Speculative Actions]
SpecActions2 -->|stored in| ArtifactCache[(Artifact Cache)]
SpecActions2 -.-> SpecActions
style ActionCache fill:#fff4e1
style ArtifactCache fill:#fff4e1
style ElementBuilds fill:#e1f0ff
style SubActions fill:#e1f0ff
```
**Process:**
1. **Priming Phase** (before build): Instantiate speculative actions by
applying overlays, submit for execution
2. **Build Phase**: Element builds hit warmed ActionCache, reducing latency
3. **Generation Phase** (after build): Record subactions with overlay
instructions describing how to adapt inputs
---
## 4. Data Model
### 4.1 Protocol Buffer Extensions
**Artifact.speculative_actions** (new field)
```protobuf
message Artifact {
Digest speculative_actions = 19; // Points to SpeculativeActions in CAS
}
message SpeculativeActions {
repeated SpeculativeAction actions = 1;
message ReferencedSpeculativeAction {
string element = 1; // Element that speculative actions are
referenced from
Digest speculative_actions = 2; // The speculative actions of the
element at the time of artifact creation
}
repeated ReferencedSpeculativeAction referenced_speculative_actions = 2;
// All speculative actions from all elements referenced in ACTION overlays
}
message SpeculativeAction {
Digest base_action_digest = 1; // Original Action from previous build
repeated Overlay overlays = 2; // How to adapt inputs
message Overlay {
enum OverlayType {
SOURCE = 0; // From element's source tree
ARTIFACT = 1; // From dependency element's artifact
output
ACTION = 2; // From another speculative action output
}
OverlayType type = 1;
string source_element = 2; // Element reference
Digest source_action = 3; // For ACTION type: which action
string source_path = 4; // Path within source
Digest target_digest = 5; // Digest to replace in input tree
}
}
```
**ActionResult.subactions** (new field in REAPI)
```protobuf
message ActionResult {
repeated Digest subactions = 10; // Digests of spawned Actions (e.g.,
recc invocations)
}
```
---
## 5. Concrete Example: libA → libB → appC
### Dependency Structure
```
libA (base library)
└─→ libB (depends on libA)
└─→ appC (depends on libB)
```
### 5.1 Example: libA Compilation
**libA Source Structure**
```
libA/
├── src/
│ ├── math_utils.h # Header file
│ └── math_utils.cpp # Implementation
└── libA.bst
```
**libA Artifact Structure** (Build Output)
```
/
├── usr/
│ ├── include/
│ │ └── math_utils.h # Installed from src/ (same digest:
"h_aaa111...")
│ └── lib/
│ └── libmath_utils.a
```
**Visual Flow:**
```mermaid
graph TB
subgraph Step1["① libA Source"]
S1["src/math_utils.cpp<br/><b>digest: cpp_aaa111...</b>"]
S2["src/math_utils.h<br/><b>digest: h_aaa111...</b>"]
end
subgraph Step2["② (Sub)Action Execution"]
A1["<b>recc g++ -c</b><br/>Action digest: aaa111..."]
end
S1 -->|input| Step2
S2 -->|input| Step2
subgraph Step3["③ (Sub)Action Result"]
O1["math_utils.o"]
end
Step2 -->|produces| O1
subgraph Step4["④ libA Artifact (/usr/)"]
AR1["/usr/include/math_utils.h<br/><b>digest: h_aaa111...</b>"]
AR2["/usr/lib/libmath_utils.a"]
end
O1 -.->|archived into| AR2
S2 -.->|installed as| AR1
subgraph Step5["⑤ Generated SpeculativeAction"]
SA["<b>base_action_digest: aaa111...</b>"]
OV1["<b>Overlay 1:</b><br/>type: SOURCE<br/>element: libA<br/>path:
src/math_utils.cpp<br/>target_digest: cpp_aaa111..."]
OV2["<b>Overlay 2:</b><br/>type: SOURCE<br/>element: libA<br/>path:
src/math_utils.h<br/>target_digest: h_aaa111..."]
end
S1 -.-|digest match| OV1
S2 -.-|digest match| OV2
OV1 --> SA
OV2 --> SA
SA --> A1
style Step1 fill:#e6f3ff
style Step4 fill:#ffe6cc
style Step5 fill:#e6ffe6
style Step2 fill:#fff4e1
style Step3 fill:#f0f0f0
```
**Recorded Subaction** (compile math_utils.cpp)
```protobuf
Action {
command_digest: "cmd_aaa111..."
input_root_digest: "dir_aaa111..."
}
Command {
arguments: ["g++", "-c", "-fPIC", "src/math_utils.cpp", "-o",
"math_utils.o"]
}
Directory (dir_aaa111...) {
files: [
File { name: "src/math_utils.cpp", digest: "cpp_aaa111..." },
File { name: "src/math_utils.h", digest: "h_aaa111..." }
]
}
```
**Generated SpeculativeAction**
```protobuf
SpeculativeAction {
base_action_digest: "aaa111..."
overlays: [
Overlay {
type: SOURCE
source_element: "libA"
source_path: "src/math_utils.cpp"
target_digest: "cpp_aaa111..." # Replace this digest in input tree
},
Overlay {
type: SOURCE
source_element: "libA"
source_path: "src/math_utils.h"
target_digest: "h_aaa111..." # Replace this digest in input tree
}
]
}
```
### 5.2 Example: libB Compilation
**libB Source Structure**
```
libB/
├── include/
│ └── vector_math.h
├── src/
│ └── vector_math.cpp # Includes math_utils.h from libA
└── libB.bst
```
**libB Artifact Structure** (Build Output)
```
/
├── usr/
│ ├── include/
│ │ └── vector_math.h
│ └── lib/
│ └── libvector_math.a
```
**Visual Flow:**
```mermaid
graph TB
subgraph Step1A["① libB Source"]
SB1["src/vector_math.cpp<br/><b>digest: cpp_bbb222...</b>"]
end
subgraph Step1B["① libA Artifact (dependency)"]
AA1["/usr/include/math_utils.h<br/><b>digest:
h_aaa111...</b><br/>(from libA artifact)"]
end
subgraph Step1C["① libA Source"]
SA2["src/math_utils.h<br/><b>digest: h_aaa111...</b>"]
end
subgraph Step2["② (Sub)Action Execution"]
A1["<b>recc g++ -c -I/usr/include</b><br/>Action digest: bbb222..."]
end
SB1 -->|input| Step2
AA1 -.->|used by compile| Step2
subgraph Step3["③ (Sub)Action Result"]
O1["vector_math.o"]
end
Step2 -->|produces| O1
subgraph Step4["④ Generated SpeculativeAction"]
SA["<b>base_action_digest: bbb222...</b>"]
OV1["<b>Overlay 1:</b><br/>type: SOURCE<br/>element: libB<br/>path:
src/vector_math.cpp<br/>target_digest: cpp_bbb222..."]
OV2["<b>Overlay 2:</b><br/>type: SOURCE<br/>element: libA<br/>path:
src/math_utils.h<br/>target_digest: h_aaa111..."]
end
SB1 -.-|digest match| OV1
SA2 -.-|"digest match<br/>(libA Source not Artifact!)"| OV2
OV1 --> SA
OV2 --> SA
SA --> A1
style Step1A fill:#e6f3ff
style Step1C fill:#e6f3ff
style Step1B fill:#ffe6cc
style Step4 fill:#e6ffe6
style Step2 fill:#fff4e1
style Step3 fill:#f0f0f0
```
**Recorded Subaction** (compile vector_math.cpp)
```protobuf
Action {
command_digest: "cmd_bbb222..."
input_root_digest: "dir_bbb222..."
}
Command {
arguments: [
"g++", "-c", "-fPIC",
"-I/usr/include", # libA headers from dependency artifact
"src/vector_math.cpp",
"-o", "vector_math.o"
]
}
Directory (dir_bbb222...) {
files: [
File { name: "src/vector_math.cpp", digest: "cpp_bbb222..." },
File { name: "/usr/include/math_utils.h", digest: "h_aaa111..." }
]
}
```
**Generated SpeculativeAction**
```protobuf
SpeculativeAction {
base_action_digest: "bbb222..."
overlays: [
Overlay {
type: SOURCE
source_element: "libB"
source_path: "src/vector_math.cpp"
target_digest: "cpp_bbb222..." # Replace this digest
},
Overlay {
type: SOURCE # Prefers SOURCE over ARTIFACT!
source_element: "libA" # Found in libA's source tree
source_path: "src/math_utils.h" # Original source path
target_digest: "h_aaa111..." # Same digest as in /usr/include/
}
]
}
```
**Key Insight**: The overlay generator finds that `math_utils.h` (digest
`h_aaa111...`) appears in libA's ARTIFACT output at
`/usr/include/math_utils.h`, but also discovers it exists in libA's SOURCE at
`src/math_utils.h` with the same digest. Following the SOURCE > ACTION >
ARTIFACT priority, it records a SOURCE overlay pointing to libA's source tree.
This creates a tighter dependency on the original source rather than the built
artifact.
### 5.3 Example: appC Linking
**appC Source Structure**
```
appC/
├── src/main.cpp
└── appC.bst
```
**appC Artifact Structure** (Build Output)
```
/
├── usr/
│ └── bin/
│ └── appC # Executable
```
**Visual Flow:**
```mermaid
graph TB
subgraph Step1A["① Previous Speculative Action Output"]
PA1["main.o<br/><b>digest: obj_ccc333...</b><br/>(from compile
action ccc333...)"]
end
subgraph Step1B["① libB Artifact"]
AB1["/usr/lib/libvector_math.a<br/><b>digest: lib_bbb222...</b>"]
end
subgraph Step1C["① libA Artifact"]
AA1["/usr/lib/libmath_utils.a<br/><b>digest: lib_aaa111...</b>"]
end
subgraph Step2["② (Sub)Action Execution"]
A1["<b>g++ link</b><br/>Action digest: ccc444..."]
end
PA1 -->|input| Step2
AB1 -->|input| Step2
AA1 -->|input| Step2
subgraph Step3["③ (Sub)Action Result"]
O1["appC executable"]
end
Step2 -->|produces| O1
subgraph Step4["④ Generated SpeculativeAction"]
SA["<b>base_action_digest: ccc444...</b>"]
OV1["<b>Overlay 1:</b><br/>type: ACTION<br/>element:
appC<br/>source_action: ccc333...<br/>path: main.o<br/>target_digest:
obj_ccc333..."]
OV2["<b>Overlay 2:</b><br/>type: ARTIFACT<br/>element:
libB<br/>path: /usr/lib/libvector_math.a<br/>target_digest: lib_bbb222..."]
OV3["<b>Overlay 3:</b><br/>type: ARTIFACT<br/>element:
libA<br/>path: /usr/lib/libmath_utils.a<br/>target_digest: lib_aaa111..."]
end
OV1 --> SA
OV2 --> SA
OV3 --> SA
PA1 -.-|digest match| OV1
AB1 -.-|digest match| OV2
AA1 -.-|digest match| OV3
SA --> A1
style Step1A fill:#e6ffe6
style Step1B fill:#ffe6cc
style Step1C fill:#ffe6cc
style Step4 fill:#e6ffe6
style Step2 fill:#fff4e1
style Step3 fill:#f0f0f0
```
**Generated SpeculativeAction** (link step)
```protobuf
SpeculativeAction {
base_action_digest: "ccc444..."
overlays: [
Overlay {
type: ACTION # From previous speculative action
source_element: "appC"
source_action: "ccc333..." # The compile action
source_path: "main.o"
target_digest: "obj_ccc333..."
},
Overlay {
type: ARTIFACT # From libB artifact output
source_element: "libB"
source_path: "/usr/lib/libvector_math.a"
target_digest: "lib_bbb222..."
},
Overlay {
type: ARTIFACT # From libA artifact output
source_element: "libA"
source_path: "/usr/lib/libmath_utils.a"
target_digest: "lib_aaa111..."
}
]
}
```
### 5.4 Overlay Type Summary
**Visual Overview:**
```mermaid
graph TD
Start[Input File Digest] --> Q1{Found in current or<br/>dependency
element's<br/>source tree?}
Q1 -->|YES| S[①<br/>SOURCE overlay]
Q1 -->|NO| Q2{Found in current or<br/>dependency
element's<br/>sub-action result?}
Q2 -->|YES| A[②<br/>ACTION overlay]
Q2 -->|NO| Q3{Found in dependency<br/>element's artifact?}
Q3 -->|YES| AR[③<br/>ARTIFACT overlay]
style S fill:#e6f3ff
style A fill:#e6ffe6
style AR fill:#ffe6cc
```
**Overlay Type Summary**
| Type | Source | Example | Use Case |
|------|--------|---------|----------|
| **SOURCE** | Element's source tree | `.cpp` files, `.h` headers | Tracks
source code changes; preferred over ARTIFACT when digest matches |
| **ARTIFACT** | Dependency element's artifact output | `.a` libraries,
built artifacts | Tracks dependency artifacts that don't exist in source |
| **ACTION** | Previous speculative action output | `.o` object files |
Links compilation steps within an element |
**Priority**: When the same digest appears in multiple locations, overlays
prefer SOURCE > ACTION > ARTIFACT. This creates tighter dependencies on
original sources rather than intermediate artifacts.
### How This Exploits Code Stability
The overlay mechanism directly exploits the research findings from Section 2:
1. **SOURCE overlays for .cpp files**: Will require updates when
implementation files change (~20% of files per build)
2. **SOURCE overlays for .h files**: Often remain valid across multiple
builds due to header stability (Adams et al., 2016)
3. **ARTIFACT overlays for dependency artifacts**: Very stable, as
dependency libraries rarely change between builds
Given that research shows 80% of files typically remain unchanged, we expect:
- ~80% of SOURCE overlays to resolve successfully with unchanged digests
- Cache hits from previous speculative executions provide the 4-5x speedup
observed in caching studies (Matev, 2020; Misu et al., 2024)
---
## 6. System Flow
### 6.1 High-Level Process Flow
```mermaid
flowchart TD
Start([Build Session Start]) --> Identify[Identify Elements to Build]
Identify --> Retrieve[Retrieve Artifacts &<br/>SpeculativeActions]
Retrieve --> Aggregate[Aggregate & Deduplicate<br/>SpeculativeActions]
Aggregate --> TopoSort[Topological Sort]
TopoSort --> Instantiate{For Each<br/>SpeculativeAction}
Instantiate --> Resolve{All Overlays<br/>Resolvable?}
Resolve -->|No| Skip[Skip Action]
Resolve -->|Yes| Apply[Apply Overlays]
Apply --> Store[Store in CAS]
Store --> Submit[Submit to Remote Execution]
Skip --> More{More?}
Submit --> More
More -->|Yes| Instantiate
More -->|No| Parallel[Speculative Execution<br/>Warms ActionCache]
Parallel --> Build[Element Builds<br/>Hit Cache]
Build --> Complete([Build Complete])
style Submit fill:#fff4e1
style Parallel fill:#fff4e1
style Build fill:#e1f0ff
```
### 6.2 Data Flow Diagram
```mermaid
flowchart TB
subgraph BuildN["Build #N"]
Start[Build #N Starts]
subgraph Priming["Cache Priming Phase"]
Start --> Sched[BuildStream Scheduler]
Sched -->|query via Asset API| AC1[(Artifact Cache)]
AC1 -->|return SpeculativeActions<br/>from Build #N-1|
SAList[SpeculativeActions]
SAList --> Inst[Instantiator]
Inst --> Over{Overlay<br/>Type?}
Over -->|SOURCE| Src[Source Tree]
Over -->|ARTIFACT| Art[Artifact Output]
Over -->|ACTION| Act[Action Output]
Src --> New[New Action]
Art --> New
Act --> New
CASD1[buildbox-casd<br/>local CAS proxy]
New -->|store via CAS API| CASD1
CASD1 -->|proxy to| CAS1[(BuildGrid CAS)]
New -->|Execute API| BuildGrid1[BuildGrid<br/>Execution Service]
BuildGrid1 -->|check| ACache[ActionCache]
ACache -->|miss| BuildGrid1
BuildGrid1 -->|dispatch via Bots API| Workers1[buildbox-worker]
Workers1 -->|fetch via| CASD_W1[buildbox-casd<br/>on worker]
CASD_W1 -->|proxy to| CAS1
Workers1 -->|execute| Runner1[buildbox-run]
Runner1 -->|result| Workers1
Workers1 -->|store result| ACache
end
subgraph Execution["Element Builds"]
Sched -->|trigger| BSElement[BuildStream<br/>Element Build]
BSElement -->|create Action| CASD2[buildbox-casd<br/>local CAS
proxy]
CASD2 -->|store via CAS API| CAS1
BSElement -->|Execute API| BuildGrid2[BuildGrid<br/>Execution
Service]
BuildGrid2 -->|check| ACache
ACache -.->|cache hit for sub-actions!| BuildGrid2
BuildGrid2 -->|dispatch via Bots API| Workers2[buildbox-worker]
Workers2 -->|fetch via| CASD_W2[buildbox-casd<br/>on worker]
CASD_W2 -->|proxy to| CAS1
Workers2 -->|execute| Runner2[buildbox-run]
Runner2 -->|spawns| SubActions[Sub-Actions:<br/>recc invocations]
SubActions -.->|benefit from warmed ActionCache| ACache
SubActions -->|record digests| Runner2
Runner2 -->|result + subactions| Workers2
Workers2 -->|ActionResult| BuildGrid2
BuildGrid2 -->|result| BSElement
BSElement -->|store via| CASD2
end
subgraph Generation["Overlay Generation (Async)"]
Gen[Overlay Generator<br/>ASYNC]
BSElement -.->|triggers async| Gen
Gen -->|fetch Actions via CAS API| CASD2
Gen -->|create| SA[SpeculativeActions]
SA -->|attach to| Art1[Artifact]
Art1 -->|store via Asset API| AC2[(Artifact Cache)]
end
end
AC2 -.->|↓ Data for Build #N+1 ↓| Boundary
Boundary[/"═══════════════════════════════════════════════════════════════════"\]
subgraph BuildN1["Build #N+1 (Uses SpeculativeActions from Build #N)"]
NextBuild[Build #N+1 will use<br/>SpeculativeActions<br/>generated
above]
end
style Start fill:#e6f3ff
style SAList fill:#ffeb99
style Gen fill:#ffe6cc
style SA fill:#ffe6cc
style Art1 fill:#ffe6cc
style AC2 fill:#ffe6cc
style Workers1 fill:#fff4e1
style Workers2 fill:#fff4e1
style BSElement fill:#e1f0ff
style ACache fill:#fff9e1
style Boundary
fill:#f0f0f0,stroke:#333,stroke-width:3px,stroke-dasharray: 5 5
style BuildN1 fill:#f8f8ff
style NextBuild fill:#ffeb99
style CASD1 fill:#e8f4f8
style CASD2 fill:#e8f4f8
style CASD_W1 fill:#e8f4f8
style CASD_W2 fill:#e8f4f8
```
**Reading the Diagram:**
This is an example setup with BuildGrid for Remote Execution services.
*Note: this diagram could do with some additional cleanup and scrutiny*
**REAPI Architecture**:
- BuildStream runs with local `buildbox-casd` as CAS proxy
- `buildbox-casd` proxies to BuildGrid's CAS, ActionCache, and Execution
services
- Workers also run `buildbox-casd` locally for efficient blob access
- All communication uses REAPI (Execute, CAS, Asset, Bots APIs)
**Cache Priming Flow**:
1. **Priming Phase**: Retrieves SpeculativeActions → instantiates → submits
via Execute API → BuildGrid dispatches to workers → results stored in
ActionCache
2. **Element Builds**: BuildStream creates Actions → Execute API → BuildGrid
checks ActionCache → dispatches to workers → `buildbox-run` spawns sub-actions
(recc) → sub-actions benefit from warmed ActionCache
3. **Overlay Generation**: Asynchronously fetches sub-action digests and
generates SpeculativeActions for Build #N+1
**Key Flow**: Build #N uses speculative actions from Build #N-1 → builds →
generates speculative actions for Build #N+1
---
## 7. Detailed Algorithms
**Note**: The following pseudocode is illustrative and non-optimal.
Production implementations should optimize for performance, especially around
CAS operations and dependency resolution.
### 7.1 Overlay Generation (after build)
**Note**: This process runs asynchronously and does not block the element
build completion or downstream element builds.
```python
# Triggered asynchronously after element build completes
async def generate_overlays(element, element_subactions):
speculative_actions = []
for subaction_digest in element_subactions:
subaction = fetch_cas(subaction_digest)
overlays = []
for input_file in subaction.input_root.files:
# Match digest to source: SOURCE > ACTION > ARTIFACT priority
# This means if digest "h_aaa111..." appears in:
# - libA's source at "src/math_utils.h" (SOURCE)
# - libA's artifact at "/usr/include/math_utils.h" (ARTIFACT)
# We prefer the SOURCE match
src_element, src_path, src_type =
find_source_element(input_file.digest)
overlay = Overlay(
type=src_type,
source_element=src_element,
source_path=src_path,
target_digest=input_file.digest # The digest we'll replace
during instantiation
)
overlays.append(overlay)
speculative_actions.append(SpeculativeAction(
base_action_digest=subaction_digest,
overlays=overlays
))
# Attach to Artifact (update artifact in cache)
artifact.speculative_actions =
store_cas(SpeculativeActions(actions=speculative_actions))
update_artifact_cache(element, artifact)
# Element's downstream dependencies can already be building
# This overlay data will be available for the NEXT build session
```
**Note on find_source_element**: This function searches through all elements
in the dependency graph that were built before the current element. It looks
for files matching the target digest in:
1. SOURCE trees (element source files)
2. ACTION outputs (from other speculative actions, including those from
earlier elements in the dependency graph)
3. ARTIFACT outputs (built artifacts from dependencies)
When the same digest appears in multiple locations, it returns the SOURCE
match, creating a direct dependency on the original source rather than
intermediate artifacts.
**Use Cases for ACTION Overlays Beyond Current Element**:
ACTION overlays are particularly valuable for intermediate build artifacts
that are generated deterministically:
- **Code generation**: protoc, flex/bison output files can be referenced
across elements
- **Resource compilation**: glib-compile-resources, Qt rcc generated code
- **Archive creation**: `.o` → `.a` via `rear` (see Section 12)
- **Intermediate transformations**: any deterministic build step producing
reusable artifacts
### 7.2 Overlay Instantiation (before build)
```python
speculative_action_map = {} # base_digest → speculative_action_digest
for action_template in topological_sort(all_speculative_actions):
base_action = fetch_cas(action_template.base_action_digest)
action = copy.deepcopy(base_action)
# Apply each overlay
all_resolved = True
for overlay in action_template.overlays:
if overlay.type == SOURCE:
src_digest = resolve_source_digest(overlay.source_element,
overlay.source_path)
elif overlay.type == ARTIFACT:
artifact_output =
wait_for_artifact_output(overlay.source_element)
src_digest = digest_at_path(artifact_output, overlay.source_path)
elif overlay.type == ACTION:
speculative_action =
speculative_action_map[overlay.source_action]
wait_for_action(speculative_action)
output = fetch_action_output(speculative_action)
src_digest = digest_at_path(output, overlay.source_path)
if src_digest is None:
all_resolved = False
break
replace_input_digest(action.input_root, overlay.target_digest,
src_digest)
if all_resolved:
speculative_action_digest = store_cas(action)
speculative_action_map[action_template.base_action_digest] =
speculative_action_digest
submit_speculative_action(speculative_action_digest)
```
---
## 8. Implementation Plan
### Phase 1: REAPI Extension
1. Add `ActionResult.subactions` field to REAPI
2. Instrument buildbox-worker to record subaction digests
3. Update buildbox-casd to store subactions with ActionResult
### Phase 2: BuildStream Integration - SOURCE Overlays Only
1. Implement overlay generator in BuildStream (SOURCE type only)
2. Add SpeculativeActions to Artifact protobuf
3. Store/retrieve SpeculativeActions via Artifact Cache
4. Implement speculative action instantiation for SOURCE overlays
5. Add speculative action submission to scheduler
6. **Make overlay generation asynchronous**: Run as background task, don't
block element builds or downstream dependencies
7. **Measure**: Overlay resolution success rate, storage overhead,
performance impact, overlay generation time
### Phase 3: ARTIFACT Overlays
1. Extend overlay generator to support ARTIFACT type
2. Update instantiation logic to resolve ARTIFACT overlays
3. **Measure**: Additional performance gains, complexity vs. benefit
### Phase 4: ACTION Overlays (potentially combined with `rear` and other re-
wrappers)
1. Implement ACTION overlay support in generator and instantiator
2. Evaluate whether ACTION overlays provide value for code generation
(protoc, flex/bison) and other deterministic intermediate steps
3. If beneficial for archive creation, implement `rear` (remote execution
ar) to complete the speculative action chain (see Section 12 for details)
4. Consider other re-wrapper candidates: ranlib, strip, resource compilers
(see Section 12)
5. **Measure**: End-to-end performance improvement
### Phase 5: Optimization
1. Optimize digest resolution using Apache Arrow for columnar digest storage
and vectorized lookups
2. Skip priming for elements with strong cache key hits and their transitive
dependencies up the dependency graph (unchanged subtrees)
3. Optimize topological sorting and deduplication
4. Implement adaptive throttling for speculative action submission
5. Add comprehensive monitoring and metrics
---
## 9. Benefits
- **Restores Parallelism**: Compilation units execute in parallel across the
entire build graph
- **Transparent**: No changes to element definitions or build scripts
- **Safe**: Incorrect/stale overlays simply become redundant; correctness
never affected
- **Efficient**: Minimal storage overhead (overlays are small metadata)
- **Incremental**: Works with partial cache hits and source changes
- **Extensible**: Can be enhanced with local execution optimization (`rear`)
and other re-wrapped commands to further improve performance
---
## 10. Key Design Decisions
1. **Store speculative actions with artifacts**: Keeps priming data
co-located with build results
2. **Three speculative action overlay types**: SOURCE/ARTIFACT/ACTION cover
all input sources with appropriate granularity
3. **Topological sorting**: Ensures ACTION overlays can resolve dependencies
4. **Skip on unresolved**: Gracefully handles missing/changed inputs without
blocking
5. **Early submission**: Submit speculative actions as soon as dependencies
resolve, regardless of element depth in graph
6. **Asynchronous overlay generation**: Overlay generation runs in the
background and does not block element build completion or downstream element
builds. SpeculativeActions become available for future builds without impacting
the current build's critical path.
---
## 11. Open Questions and Key Uncertainties
### Implementation Questions
1. **Resource limits**: How many concurrent speculative actions should we
allow?
2. **Metrics**: What telemetry do we need to measure priming effectiveness?
### Critical Success Factors
#### 1. Phased Rollout Strategy
**Question**: Should we implement all overlay types at once, or phase them?
**Proposal**: Start with SOURCE overlays only (Phase 2), measure impact,
then add ARTIFACT (Phase 3), then ACTION (Phase 4).
**Rationale**:
- SOURCE overlays are simplest and likely provide the most benefit
- ARTIFACT overlays add cross-element dependencies but may have lower
resolution rates
- ACTION overlays require careful dependency management and may only be
valuable combined with `rear`
**Required**: Define success criteria for each phase before proceeding to
the next.
#### 2. Overlay Resolution Success Rate
**Question**: How often will overlays successfully resolve in practice?
**Context**: Speculative actions are skipped if any overlay cannot be
resolved. Success rate depends on:
- How frequently source files change between builds
- Whether the same digests appear in both SOURCE and ARTIFACT locations
- For ACTION overlays: whether dependency chains remain intact
**Research Evidence**: Studies of C/C++ build caching and code churn
patterns provide encouraging data:
- **Adams et al. (2016)**: Most C/C++ header files are relatively stable
compared to implementation files
- **Misu et al. (2024)**: Incremental builds with caching achieve 4.22x to
4.71x speedups, demonstrating high cache hit rates in practice
- **Matev (2020)**: Hot ccache provides 4-5x speedup over cold cache in
large C++ projects
- **Chowdhury et al. (2024)**: Approximately 80% of code changes concentrate
in 20% of methods (Pareto principle)
- **Shin et al. (2011)**: In Mozilla and Red Hat Linux, 89.1% of files were
in the low-churn, stable category
- **Macho et al. (2021)**: Analysis of 144 Maven projects and build changes
found that while dependency version updates are among the top-10 most frequent
build changes, structural changes to build configuration (adding/removing
dependencies, changing build order) occur much less frequently. Build
maintenance is primarily driven by accommodating changes to production code
rather than restructuring the build itself.
- **McIntosh et al. (2012)**: Study of Java build systems showed that build
complexity evolves slowly over time, with build structure remaining relatively
stable between major refactorings.
**Implication for Build Structure Stability**: Research shows that while
source code changes frequently, build structure (which libraries are linked,
link order, build rules) changes much less often:
- ~80% of SOURCE overlays for .cpp files to resolve successfully (unchanged
sources)
- >90% of SOURCE overlays for .h files to resolve (headers more stable per
Adams)
- >95% of ARTIFACT overlays to resolve (dependency artifacts rarely change)
- ~85-90% of ACTION overlays to resolve (link lines and build structure
change infrequently per Macho/McIntosh)
**Real-world cache effectiveness**: The Misu and Matev studies showing 4-5x
speedups from caching suggest that cache hit rates of 75-80% are achievable in
practice, directly supporting the viability of Cache Priming.
**Required**:
- Instrument prototype to measure actual resolution rates on real projects
- Identify common failure patterns
- Validate against freedesktop-sdk and GNOME builds
- Determine if partial resolution strategies would help
**Risk**: If <50% of speculative actions resolve successfully, the system
may not provide sufficient benefit to justify the complexity. However, research
suggests success rates should be substantially higher (70-80%+).
#### 3. Non-Determinism Handling
**Question**: How do non-deterministic builds affect speculative action
correctness?
**Context**: If a build produces different outputs for the same inputs, the
recorded `base_action_digest` may not be reusable.
**Current Assumption**: Non-deterministic actions will still return cached
results when the ActionCache entry hasn't been evicted. If a new execution
produces different outputs, it generates a new subaction, and
SpeculativeActions are updated for the next build.
**Required**: Validate this assumption and document expected behavior for
non-deterministic builds.
#### 4. Digest Resolution Performance Impact
**Question**: Can we make `find_source_element` fast enough to avoid
becoming a bottleneck?
**Context**: During overlay generation, every input file of every subaction
needs its digest resolved against all source trees, artifact outputs, and prior
actions in the dependency graph. For large builds, this could be thousands of
files × thousands of potential sources.
**Critical Insight**: Since overlay generation runs asynchronously and
doesn't block element builds or speculative action execution, performance
requirements are relaxed:
- Overlay generation for element N can run while elements N+1, N+2, etc. are
building
- Generated SpeculativeActions are used in the *next* build session, not the
current one
- Slow overlay generation only delays availability of speculative actions
for future builds
**Optimization Strategies**:
- **Apache Arrow for digest lookups**: Use Apache Arrow's columnar format to
store digest tables with built-in vectorized operations
- Arrow provides O(1) random access with cache-efficient columnar layout
- Built-in SIMD optimizations for hash lookups and comparisons
- Cross-language support (C++, Python, Rust) for BuildStream integration
- Set lookup kernels with hash table support already implemented
- No need to implement custom indices, bloom filters, or vectorization
- **Columnar digest storage**: Store digests in Arrow tables: `[(digest:
FixedSizeBinary(32), element: string, path: string, type: int8)]`
- **Arrow compute kernels**: Use `pc.is_in()` for individual digest checks
and `pc.join(..., join_type="semi")` for bulk digest matching against large sets
- **Memory efficiency**: Arrow's columnar format provides 64-byte alignment
for optimal SIMD performance
**Why Apache Arrow**:
- Arrow's columnar layout enables vectorization using SIMD operations, with
64-byte alignment matching AVX-512 register width
- Arrow provides built-in set lookup kernels with hash table support for
exactly this use case
- Arrow's pipeline and SIMD algorithms deliver 10-100x performance
improvements through cache locality and parallel operations
- Arrow enables efficient data processing without serialization overhead,
making cross-module data sharing trivial
- Proven at scale: PySpark achieved 10-100x performance improvements using
Arrow
**Required**:
- Profile and optimize on real-world projects (freedesktop-sdk, GNOME)
- Measure: time to generate overlays vs. element build time
- Implement Apache Arrow-based digest lookup tables as primary strategy
- Leverage Arrow's built-in set lookup kernels and hash table support
**Acceptable Performance**: Overlay generation should complete within 1-2x
the element's build time. Since it runs asynchronously, it won't impact the
current build's critical path.
**Risk**: If overlay generation is extremely slow (>10x element build time),
it might not complete before the next build session starts, reducing cache
priming effectiveness.
#### 5. Speculative Action Scale and Remote Execution Capacity
**Question**: Won't submitting thousands of speculative actions overwhelm
the remote execution cluster?
**Context**: A large C/C++ BuildStream project might generate thousands of
speculative actions (one per compilation unit). For example, a project with 100
elements averaging 100 compilation units each would produce 10,000 speculative
actions.
**Comparison to Existing Systems**: This scale is actually modest compared
to build systems already using Remote Execution:
- Bazel builds of TensorFlow generate over 8,191 actions executing across
hundreds of remote workers
- Google runs millions of builds executing millions of test cases every day,
with builds depending on tens of thousands of targets
- A large TypeScript project with 10M lines of code generates 1,100 Bazel
targets, with benchmarks using 100 remote executors
- Bazel builds often include thousands or tens of thousands of small actions
such as compiling, linking, or running tests
**Key Insight**: Remote Execution systems like BuildGrid are designed to
handle tens of thousands of concurrent actions. Cache Priming's speculative
actions are within the normal operating range of these systems.
**Mitigation Strategies**:
- **ActionCache hits**: Many speculative actions will hit immediately,
reducing actual execution load
- **Priority scheduling**: Actual element builds can be prioritized over
speculative actions
- **Adaptive throttling**: Submit speculative actions at a controlled rate
based on cluster capacity
- **Gradual rollout**: Start with Phase 2 (SOURCE overlays only) which
generates fewer actions
**Required**:
- Monitor remote execution cluster utilization during prototype testing
- Implement priority queuing: actual builds > speculative actions
- Consider time-windowing: only prime for elements building soon
**Acceptable**: Speculative actions should consume <30% of remote execution
capacity, leaving headroom for actual builds.
#### 6. Storage and Network Overhead
**Question**: What is the actual storage overhead of SpeculativeActions?
**Context**:
- Each artifact now includes SpeculativeActions metadata
- For compilation-heavy elements (1000+ compilation units), this could be
significant
- ReferencedSpeculativeAction stores only element name + digest (minimal
duplication)
**Required**:
- Measure actual storage increase on real projects
- Consider compression strategies if overhead is significant
- Provide opt-out mechanism for elements that don't benefit
#### 7. Value of ACTION Overlays Without rear
**Question**: Do ACTION overlays provide benefit on their own, or only when
combined with `rear`?
**Context**: ACTION overlays create dependencies between speculative actions
within an element. Without `rear` (see Section 12), the primary use case would
be linking, which requires archives that may not exist as separate subactions.
**Proposal**: Evaluate ACTION overlays and `rear` together in Phase 4, as
they may be interdependent.
**Required**: Identify other potential uses for ACTION overlays beyond
archive creation.
### Success Metrics
To validate the proposal, we need to measure:
1. **Performance**: End-to-end build time reduction (target: >20% for C/C++
heavy projects)
2. **Resolution Rate**: Percentage of speculative actions successfully
instantiated (target: >70%)
3. **Storage Overhead**: Size increase of artifact cache (acceptable: <20%)
4. **Overlay Generation Time**: Time to generate overlays per element
(acceptable: <2x element build time, since it's async)
5. **Worker Utilization**: Speculative actions should not starve actual
builds
### Validation Plan
1. **Implement Phase 1-2**: SOURCE overlays only on a prototype
2. **Test on freedesktop-sdk**: Large, real-world C/C++ heavy BuildStream
project
3. **Measure all success metrics**: Gather data before proceeding
4. **Iterate or pivot**: If metrics don't meet targets, adjust design or
abandon
5. **Proceed to Phase 3+**: Only if Phase 2 shows clear benefit
---
## 12. Future Enhancement: Local Execution with rear
### Motivation
In the base Cache Priming design, speculative actions follow this pattern:
1. Compile `.cpp` → `.o` (via `recc`, speculative)
2. Upload `.o` to CAS
3. Download `.o` files for archiving
4. Create `.a` archive
5. Upload `.a` to CAS
However, since BuildStream submits entire element builds as Actions to
workers, and `recc` returns `.o` files during the element build, those `.o`
files are **already local on the worker**. We can optimize archive creation by
keeping it local.
### The `rear` Command
Similar to `recc` (remote execution caching compiler), **rear** (remote
execution ar) wraps the `ar` archiver:
```bash
# Traditional ar usage
ar rcs libmath_utils.a math_utils.o helpers.o
# With rear wrapper
rear rcs libmath_utils.a math_utils.o helpers.o
```
**rear behavior**:
1. Generates an Action describing the archive operation
2. Checks ActionCache for existing result
3. If miss:
- **During main element build**: Executes `ar` **locally on the worker**
(via buildbox-casd) since `.o` files are already present
- **During speculative execution**: Executes remotely (fetches `.o` files
from CAS as needed)
4. Stores result in ActionCache
5. Records the Action digest in `ActionResult.subactions`
### Extended Speculative Action Chain
With `rear`, we get a complete chain of speculative actions:
```
libA element build (main build):
recc g++ -c src/math_utils.cpp -o math_utils.o [remote or cached]
↓ (local .o file on worker)
rear rcs /usr/lib/libmath_utils.a math_utils.o [LOCAL execution]
↓ (local .a file on worker)
g++ app.o -L/usr/lib -lmath_utils -o final_binary [remote or cached]
Speculative execution (cache priming):
recc compile action [remote, warms
ActionCache]
↓ (.o uploaded to CAS)
rear archive action [remote, fetches .o
from CAS]
↓ (.a uploaded to CAS)
link action [remote, fetches .a
from CAS]
```
### Example: libA with `rear`
**Recorded Subactions** during libA build:
```protobuf
ActionResult {
subactions: [
"recc_compile_math_utils...", // .cpp → .o
"recc_compile_helpers...", // .cpp → .o
"rear_archive..." // .o → .a
]
}
```
**Generated SpeculativeAction for rear**:
```protobuf
SpeculativeAction {
base_action_digest: "rear_archive..."
overlays: [
Overlay {
type: ACTION
source_element: "libA.bst"
source_action: "recc_compile_math_utils..."
source_path: "math_utils.o"
target_digest: "obj_math_utils..."
},
Overlay {
type: ACTION
source_element: "libA.bst"
source_action: "recc_compile_helpers..."
source_path: "helpers.o"
target_digest: "obj_helpers..."
}
]
}
```
### Key Advantages
1. **Faster Main Builds**: During the actual element build, `.o` files are
local on the worker, so archive creation executes immediately without network
overhead
2. **Earlier Link Actions**: Even though speculative `rear` actions run
remotely, they still populate the ActionCache, enabling link actions to start
as soon as archives are cached
3. **ActionCache Hits**: When the main build runs, `rear` checks ActionCache
first and gets an immediate hit from the speculative execution
4. **Reduced Critical Path**: The main build's archive step becomes nearly
instantaneous (cache hit), shortening the overall build time
**Note**: Future optimizations (out of scope for this proposal) could
include locality hints to prefer scheduling speculative `rear` actions on
workers that already have the `.o` files, further reducing network traffic.
### Execution Flow
```
Cache Priming Phase (speculative):
Submit compile actions (recc) → executes remotely, warms ActionCache
↓ [.o files uploaded to CAS]
Submit archive action (rear) → executes remotely, fetches .o from CAS
↓ [.a file uploaded to CAS, ActionCache warmed]
Submit link actions → executes remotely, warms ActionCache
↓
Main Build Phase:
Element build starts
↓
recc compile → ActionCache HIT (instant)
↓ [.o files now local on worker]
rear archive → ActionCache HIT, but if miss, executes locally (instant)
↓ [.a file now local on worker]
link → ActionCache HIT (instant)
↓
Element build completes in minimal time
```
The key insight: **rear enables ActionCache hits during the main build, and
when those hits miss, local execution is very fast because the input `.o` files
are already on the worker**.
### Beyond `rear`: Instrumentation Strategy
The `rear` optimization illustrates a general principle: **identify commands
that operate on local intermediate data and wrap them for speculative
execution**.
**Candidates for "re-" wrappers**:
- `ranlib`: Index generation for static libraries (fast, local)
- `strip`: Symbol stripping (fast, local, self-contained)
- Code generators: protoc, flex/bison (deterministic, cacheable)
- Resource compilers: glib-compile-resources, Qt rcc
**Instrumentation approach**:
1. Monitor build logs to identify frequent intermediate commands
2. Measure: execution time, input/output sizes, determinism
3. Prioritize commands that are:
- Fast enough for local execution (< 1 second)
- Self-contained (few dependencies)
- Operate on data already local to worker
- Appear frequently across builds
4. Implement re-wrappers and measure impact
This creates a virtuous cycle: more instrumentation → more optimization
opportunities → better build performance.
---
## References
- Adams, B., McIntosh, S., Nagappan, M., and Hassan, A.E. (2016).
"Identifying and understanding header file hotspots in C/C++ build processes."
*Automated Software Engineering*, Vol. 23, pp. 61-90.
https://dl.acm.org/doi/10.1007/s10515-015-0183-5
- Chowdhury, S., Uddin, G., Hemmati, H., and Holmes, R. (2024). "The Good,
the Bad, and the Monstrous: Predicting Highly Change-Prone Source Code Methods
at Their Inception." *ACM Transactions on Software Engineering and
Methodology*, Vol. 33, No. 4. https://dl.acm.org/doi/10.1145/3715006
- Macho, C., McIntosh, S., and Pinzger, M. (2021). "The nature of build
changes." *Empirical Software Engineering*, Vol. 26, Article 52.
https://link.springer.com/article/10.1007/s10664-020-09926-4
- Matev, R. (2020). "Fast distributed compilation and testing of large C++
projects." *EPJ Web of Conferences*, Vol. 245.
https://www.epj-conferences.org/articles/epjconf/pdf/2020/21/epjconf_chep2020_05001.pdf
- McIntosh, S., Adams, B., and Hassan, A.E. (2012). "The Evolution of Java
Build Systems." *Empirical Software Engineering*, Vol. 17, No. 4-5, pp. 578-608.
- Misu, S., Lin, B., and Monden, A. (2024). "Does Using Bazel Help Speed Up
Continuous Integration Builds?" *arXiv preprint*.
https://arxiv.org/html/2405.00796v1
- Nagappan, N. and Ball, T. (2007). "Using Software Dependencies and Churn
Metrics to Predict Field Failures: An Empirical Case Study." *First
International Symposium on Empirical Software Engineering and Measurement (ESEM
2007)*. https://ieeexplore.ieee.org/document/4343764/
- Shin, Y., Meneely, A., Williams, L., and Osborne, J. (2011). "Evaluating
Complexity, Code Churn, and Developer Activity Metrics as Indicators of
Software Vulnerabilities." *IEEE Transactions on Software Engineering*, Vol.
37, No. 6, pp. 772-787. https://ieeexplore.ieee.org/document/5560680/
- Munson, J.C. and Elbaum, S. (1998). "Code Churn: A Measure for Estimating
the Impact of Code Change." *Proceedings of the International Conference on
Software Maintenance*, pp. 24-31. https://ieeexplore.ieee.org/document/738486/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]