GitHub user hanishi added a comment to the discussion: Pekko Ad
Network(promovolve)
I should be clear about the role the LLM played here. The pacing logic was not
designed by the model. What it provided was acceleration: a way to iterate
through reasoning faster, surface edge cases earlier, and pressure-test
assumptions, particularly around control-theoretic failure modes that would
otherwise have taken much longer to uncover.
The final system is entirely the product of human understanding and
mathematical invariants. Every decision, constraint, and trade-off was derived,
validated, and owned by me. The LLM shortened the path to insight; it did not
replace the reasoning required to get there. Below are notes produced through
iterative discussion between the LLM and myself.
# Budget Pacing
Budget pacing spreads ad spend evenly throughout the day, preventing the budget
from being exhausted early. This prevents campaigns from going dark in the
afternoon after burning through their budget in the morning.
## Overview
Pacing is **always enforced** at the AdServer level using a pluggable
`PacingStrategy`. The default strategy is `RateAwarePacing`, which uses PI
(Proportional-Integral) control to dynamically adjust throttling based on:
- **Spend ratio**: actual spend vs expected spend (traffic-shaped or linear)
- **Request arrival rate**: observed requests/sec for accurate throttle
calculation
- **Traffic shape**: learned or configured hourly traffic patterns
## Architecture
Pacing is modular and configurable:
```
promovolve.publisher.delivery/
PacingStrategy.scala -- trait with shouldServe() method + PacingContext
AdaptivePacing.scala -- RateAwarePacing factory with PI control
TrafficShapeTracker.scala -- learns/stores traffic patterns per time bucket
promovolve.publisher.delivery.pacing/
DayClock.scala -- real vs simulated day handling (UTC-based)
TrafficObserver.scala -- EMA-smoothed request rate tracking
PacingController.scala -- coordinates pacing state and day boundaries
```
### Flow Diagram
**IMPORTANT**: Pacing decision (`shouldServe`) happens BEFORE Thompson Sampling
selection.
This prevents exploration bias where TS picks an exploration arm that then gets
filtered by pacing.
```
AdServer CampaignEntity
│ │
│ ┌─────────────────────┐ │
│ │ PacingStrategy │ │
│ │ .shouldServe() │ │
│ └──────────┬──────────┘ │
│ │ │
│ [if shouldServe=false] │
│ → return NoSelection │
│ → do NOT run TS │
│ │
│ [if shouldServe=true] │
│ │ │
│ ┌─────────────────┐ │
│ │ Thompson Sample │ │
│ │ (pick winner) │ │
│ └────────┬────────┘ │
│ │ │
│── TryReserve ─────────────────────▶│
│◀── Reserved / InsufficientBudget ──│
│ │
▼ ▼
```
**Key design decisions:**
- Pacing is always enforced (no opt-out)
- `shouldServe()` is called BEFORE Thompson Sampling (critical for correct TS
learning)
- Strategies use `PacingContext` for all state (spend, budget, time, traffic
shape)
- AdServer orchestrates pacing (not CampaignEntity)
- Easy to swap strategies without code changes
## RateAwarePacing (Default Strategy)
PI-controlled pacing that directly adjusts throttle probability based on spend
error.
### How It Works
1. **Base throttle**: Calculate what throttle would achieve perfect pace
```
baseThrottle = 1 - (targetImpsPerSec / requestRate)
```
2. **Error calculation**: Positive when under-paced, negative when over-paced
```
error = 1.0 - spendRatio
```
3. **PI adjustment**: Directly added/subtracted from throttle
```
adjustment = Kp × error + Ki × integralError
finalThrottle = baseThrottle - adjustment
```
4. **Traffic shape multiplier**: Scale target based on expected traffic volume
- During peak hours (multiplier > 1): higher target allows more impressions
- During valley hours (multiplier < 1): lower target prevents overspend
### PI Control Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `Kp` | 0.5 | Proportional gain - immediate response to error |
| `Ki` | 0.3 | Integral gain - accumulated error correction |
| `feedforwardWindow` | 0.0-0.2 | Proactive adjustment near bucket transitions |
| `gracePeriodFraction` | 0.01 | Startup period (1% of day) with no PI
adjustment |
### Asymmetric Gains
The controller uses **asymmetric gains** to recover from overspend faster than
it accelerates during underspend:
- **Over-pacing** (spendRatio > 1.0): gains multiplied by 2.0x
- **Under-pacing** (spendRatio ≤ 1.0): gains unchanged
This is because overspend is costly (budget exhaustion stops delivery), while
underspend is recoverable (can catch up later).
### Volatility-Adjusted Gains
When created via `AdaptivePacing.forShape()`, PI gains are automatically tuned
based on traffic shape volatility:
| Volatility (CV) | Kp | Ki | Feedforward | Use Case |
|-----------------|-----|-----|-------------|----------|
| 0.0 (uniform) | 0.3 | 0.2 | 20% | Flat traffic |
| 0.5 (typical) | 0.5 | 0.3 | 10% | Normal daily pattern |
| 1.0 (high) | 0.8 | 0.5 | 0% | Drastic peaks/valleys |
| 1.5+ (extreme) | 1.0 | 0.6 | 0% | Very spiky traffic |
### Grace Period (Hybrid: Time + Request Count + Staleness)
The grace period uses a **hybrid** condition requiring BOTH time AND request
count, plus staleness detection:
```scala
// In AdaptivePacing.throttleProbability()
val initialGraceComplete =
ctx.elapsedSeconds >= MinGraceSeconds && // 10 seconds
ctx.requestCount >= MinGraceRequests // 50 requests
// Staleness threshold scales with day duration
val scaledStaleThresholdMs = max(
MinStaleRateThresholdMs, // 1000ms minimum
BaseStaleRateThresholdMs * dayDurationSeconds / 86400
)
val rateIsStale = ctx.msSinceLastRequest > scaledStaleThresholdMs
val inGracePeriod = !initialGraceComplete || rateIsStale
```
**Key constants** (in `AdaptivePacing` object):
```scala
val MinGraceSeconds: Double = 10.0 // Time before PI activates
val MinGraceRequests: Long = 50L // Requests before PI activates
val BaseStaleRateThresholdMs: Long = 30000L // Base staleness for real days
val MinStaleRateThresholdMs: Long = 1000L // Minimum staleness threshold
```
**Scaled staleness examples**:
| Day Duration | Scaled Threshold | Effective |
|--------------|------------------|-----------|
| 86400s (real) | 30000ms | 30 seconds |
| 3600s (1 hour) | 1250ms | 1.25 seconds |
| 600s (10 min) | 208ms → 1000ms | 1 second (clamped to min) |
**During grace period**:
- Uses base throttle only (no PI adjustment)
- Does NOT accumulate integral error
- Prevents step-changes when grace ends
**CRITICAL**: If `requestCount` or `msSinceLastRequest` are not properly passed
through behavior calls, grace period may never complete, causing pacing to fail
silently.
**Staleness re-entry**: After a 30+ second traffic gap (e.g., quiet period),
the rate data is considered stale and grace period re-activates. This prevents
using outdated rate estimates.
## Traffic Shape Tracking
`TrafficShapeTracker` learns or stores traffic patterns for traffic-aware
pacing.
### Problem: Linear Pacing
Linear pacing assumes uniform traffic:
```
Budget: $30/day
Linear Target:
Hour: 0 6 12 18 24
Target: $0 $7.5 $15 $22.5 $30 (straight line)
```
But real traffic has shape:
```
Traffic Volume:
▁▁▂▃▅▆███▇▆▅▄▃▂▁▁
night peak evening
```
This causes over-throttling during peaks and impossible targets during valleys.
### Solution: Traffic-Shaped Targeting
The cumulative distribution function (CDF) of traffic becomes the expected
spend curve:
```
Traffic-Shaped Target:
Hour: 0 6 12 18 24
Target: $1 $5 $18 $26 $30
╱ ╱╲
╱ ╱ ╲
╱──────╱ ╲────────
└─ matches traffic shape ─┘
```
### Bucket-to-Time Mapping
The `trafficShape` array (24 values) maps to time **relative to `dayStart`**:
| Index | Time from dayStart | Example (dayStart = midnight UTC) |
|-------|-------------------|-----------------------------------|
| 0 | +0h to +1h | 00:00 - 01:00 UTC |
| 6 | +6h to +7h | 06:00 - 07:00 UTC |
| 12 | +12h to +13h | 12:00 - 13:00 UTC |
| 22 | +22h to +23h | 22:00 - 23:00 UTC (typical peak) |
| 23 | +23h to +24h | 23:00 - 00:00 UTC |
### Weekday vs Weekend Shapes
Two separate shapes can be configured:
- `weekdayShapeVolumes`: 24 values for Monday-Friday
- `weekendShapeVolumes`: 24 values for Saturday-Sunday
The system automatically selects the appropriate shape based on the current day
(UTC).
## DayClock
Handles real vs simulated day timing with **consistent UTC timezone**.
### RealDayClock (dayDurationSeconds = 86400)
- `dayStart` is UTC midnight today
- `elapsedSeconds` = time since midnight UTC
- Traffic shape bucket aligns with UTC wall clock hour
Example at 14:00 UTC:
```
elapsedSeconds = 50400 (14 hours)
bucket = 14
```
### SimulatedDayClock (dayDurationSeconds < 86400)
- `dayStart` is when simulation started (or last day rollover)
- `elapsedSeconds` starts from 0
- Elapsed time is **scaled** to 86400 for traffic shape lookup
Example with `dayDurationSeconds = 600` (10-minute day):
```
After 25 seconds (1/24 of 600):
scaledElapsed = 25 × (86400 / 600) = 3600
bucket = 1
```
### dayDurationSeconds Validation
**`dayDurationSeconds` must not exceed 86400** (24 hours).
- Server rejects values > 86400 with error: `"dayDurationSeconds cannot exceed
86400 (24 hours)"`
- Client (RunScenario) exits with same error
This ensures the traffic shape (24 buckets) maps correctly to time.
## TrafficObserver
Tracks request arrival rate using exponential moving average (EMA).
```scala
// Update on each request
observer.recordRequest(now)
// Get smoothed rate
val reqPerSec = observer.smoothedRate // e.g., 150.3
```
The smoothed rate is used by `RateAwarePacing` to calculate base throttle:
```
baseThrottle = 1 - (targetRate / observedRate)
```
Without rate tracking, high-traffic scenarios would cause severe throttle
oscillation.
## PacingController
Coordinates pacing state across components:
- Tracks `dayStart` for elapsed time calculation
- Detects day boundaries (UTC midnight for real days)
- Manages pacing strategy lifecycle (reset at day boundary)
- Stores/restores traffic shape snapshots
### Day Boundary Detection
```scala
def hasNewDayStarted(newDayStart: Instant): Boolean = {
val lastDay = LocalDate.ofInstant(lastDayStart, ZoneOffset.UTC)
val newDay = LocalDate.ofInstant(newDayStart, ZoneOffset.UTC)
lastDay != newDay
}
```
**All day boundary logic uses UTC** for consistency across server and client.
## UTC Timezone Requirement
**The entire pacing system uses UTC consistently:**
| Component | UTC Usage |
|-----------|-----------|
| `DayClock` | `utcMidnightToday()` for real days |
| `AdServer` | Day boundary detection via `ZoneOffset.UTC` |
| `PacingController` | Day comparison via `ZoneOffset.UTC` |
| `RunScenario` (client) | `LocalTime.now(ZoneOffset.UTC).getHour` for bucket |
This ensures client and server agree on which traffic shape bucket to use.
## PacingContext
Immutable snapshot passed to strategies:
```scala
final case class PacingContext(
dailyBudget: BigDecimal,
todaySpend: BigDecimal,
dayStart: Instant,
now: Instant,
requestArrivalRate: Double = 0.0, // From TrafficObserver
competingCampaigns: Int = 1,
avgCpm: Double = 5.0,
dayDurationSeconds: Int = 86400, // Must be <= 86400
trafficShape: Option[TrafficShapeTracker] = None
) {
def elapsedHours: Double
def expectedSpendFraction: Double // Traffic-shaped or linear
def expectedSpend: BigDecimal
def spendRatio: Double // actual / expected
def remainingBudget: BigDecimal
def remainingHours: Double
}
```
## Configuration
### Site Pacing Config
```bash
# Set pacing config for a site
curl -X PUT http://localhost:8080/v1/publishers/pub-1/sites/site-123/pacing \
-H "Content-Type: application/json" \
-d '{
"dayDurationSeconds": 600,
"weekdayShapeVolumes":
[0.3,0.2,1.0,0.2,0.0,0.3,0.5,2.5,0.1,2.0,1.5,2.0,2.5,3.0,2.5,2.0,1.5,1.2,1.2,1.0,0.4,2.0,5.0,0.4],
"weekendShapeVolumes":
[0.3,0.2,0.1,0.1,0.1,0.2,0.3,0.5,0.8,1.2,1.5,1.8,2.0,2.2,2.3,2.2,2.0,1.8,2.0,2.5,2.8,2.0,1.2,0.5]
}'
```
### Test Throttle Override
For testing, you can force a fixed throttle probability:
```json
{
"testThrottleOverride": 0.5
}
```
This bypasses PI control and uses `FixedThrottlePacing(0.5)`.
### Changing the Default Strategy
To use a different strategy, modify `AdServer.apply()`:
```scala
AdServer(
publisherId,
// ... other params
pacingStrategy = AdaptivePacing.forShapeVolumes(myShapeArray),
// or: pacingStrategy = FixedThrottlePacing(0.3),
)
```
## Observing Pacing
### Server-side Stats
```bash
curl http://localhost:8080/test/site-stats/site-123
```
Response:
```json
{
"siteId": "site-123",
"selected": 58,
"pacingSkipped": 42,
"budgetExhausted": 0,
"noCandidates": 0,
"totalSpend": 0.29,
"elapsedHours": 0.5,
"expectedSpendFraction": 0.5
}
```
### RunScenario Reports
When running `RunScenario.scala` with `--continuous`, periodic reports include:
```
─── Report @ 100 requests (45.2s elapsed) ───
Requests: 100 (2/sec)
Selected: 58 (58.0%)
Pacing skip: 42
Pacing status:
Spend ratio: 1.02x → (stable)
Spend rate: $0.0064/sec (target: $0.0067/sec)
Rate status: ON PACE
```
## Outcome Types
| Outcome | Description |
|---------|-------------|
| `selected` | Ad was served successfully |
| `pacingSkipped` | Rejected by `shouldServe()` to control spend rate |
| `budgetExhausted` | Campaign has no remaining budget |
| `noCandidates` | No eligible ads for this request |
## Testing
Run a simulation with budget constraints to observe pacing:
```bash
# 1. Start the server
sbt "api/run"
# 2. Run pacing test with traffic shapes (10-minute simulated day)
scala-cli scripts/RunScenario.scala -- \
--scenario scenarios/continuous.json \
--continuous
# 3. Or run with real-day timing (aligns with UTC wall clock)
# Edit continuous.json: "dayDurationSeconds": 86400
```
The periodic reports will show:
- Spend ratio converging to 1.0x
- Pacing skip rate adjusting to maintain pace
- Traffic shape bucket changes (for short days)
## Custom Strategies
Implement the `PacingStrategy` trait:
```scala
trait PacingStrategy {
/** Called BEFORE Thompson Sampling - return true to serve, false to skip */
def shouldServe(ctx: PacingContext): Boolean
/** Calculate throttle probability [0.0, 1.0] */
def throttleProbability(ctx: PacingContext): Double
/** Reset state at day boundary */
def reset(): Unit
/** Strategy name for logging */
def name: String
}
```
Example custom strategy:
```scala
class TimeOfDayPacing extends PacingStrategy {
def throttleProbability(ctx: PacingContext): Double = {
val hour = (ctx.elapsedHours % 24).toInt
if (hour >= 9 && hour <= 17) 0.0 // No throttle during business hours
else 0.8 // Heavy throttle off-hours
}
def name = "time-of-day"
}
```
## Debugging Pacing Issues
### Issue: 0% Pacing Despite Overspend
**Symptoms**: `pacingSkipped = 0` even with 2x-4x overspend ratio
**Root cause**: Usually caused by grace period never completing.
**Check 1: Behavior parameter propagation**
The `behavior()` function in `AdServer.scala` has many parameters with default
values:
```scala
def behavior(
...
requestCount: Long = 0L, // Default: 0
lastRequestTimeMs: Long = 0L // Default: 0
)
```
**CRITICAL BUG PATTERN**: If a `behavior()` call doesn't pass these parameters,
they silently reset to 0:
```scala
// BAD - missing requestCount and lastRequestTimeMs
behavior(
cachedDomainBlocklist,
creativeStats,
serveStats,
lastDayStart,
pacingStrategy,
smoothedReqRate,
pendingSpendByCampaign,
dayDurationSeconds,
spendInfoCache,
trafficShapeTracker,
rolloverGraceUntilMs,
warmupMode
// MISSING: requestCount, lastRequestTimeMs → both become 0!
)
// GOOD - all parameters passed
behavior(
cachedDomainBlocklist,
creativeStats,
serveStats,
lastDayStart,
pacingStrategy,
smoothedReqRate,
pendingSpendByCampaign,
dayDurationSeconds,
spendInfoCache,
trafficShapeTracker,
rolloverGraceUntilMs,
warmupMode,
requestCount, // Preserve state
lastRequestTimeMs // Preserve state
)
```
With `requestCount = 0`, the grace period condition `requestCount >= 50` is
never met, so pacing uses only `baseThrottle` (which is 0 when request rate is
low).
**How to find this bug**: Search for all `behavior(` calls in `AdServer.scala`
and verify each one passes all 14 parameters.
**Check 2: SpendInfo cache empty**
If `spendInfoCache` is always empty, pacing gate goes through
`SpendInfoFetched` path. Check logs for:
```
PACING GATE: Cache empty, fetching spend info from N campaigns
```
If this appears on every request, spend updates aren't being cached.
**Check 3: Stale SpendUpdate filtering**
For simulated days, SpendUpdates are filtered if their `dayStart` is too old:
```scala
val isStale = dayDurationSeconds != 86400 &&
lastDayStart.exists(currentDayStart =>
su.dayStart.toEpochMilli < currentDayStart.toEpochMilli - 5000
)
```
Check logs for: `Ignoring stale SpendUpdate`
### Issue: Pacing Too Aggressive
**Symptoms**: Very few impressions served, high pacingSkipped
**Possible causes**:
1. PI gains too high for traffic pattern
2. Traffic shape mismatch
3. Rate estimate too high
### Key Log Messages
Enable debug logging and look for:
```
// Pacing decisions
PACING GATE: Cache empty, fetching spend info from N campaigns
PACING GATE: Request throttled (aggregateThrottle=X%)
PACING GATE: Request passes (aggregateThrottle=X%)
// Day boundary
Day rollover detected, resetting pacing for new day
// Grace period
Grace period ended: fresh SpendUpdate received
// SpendUpdate handling
Ignoring stale SpendUpdate: campaign=X updateDayStart=Y currentDayStart=Z
SpendUpdate received: campaign=X spend=Y budget=Z dayStart=W
```
GitHub link:
https://github.com/apache/pekko/discussions/2608#discussioncomment-15526511
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]