The GitHub Actions job "Tests (AMD)" on airflow.git/v3-2-test has succeeded.
Run started by GitHub user vatsrahul1001 (triggered by vatsrahul1001).

Head commit for run:
0dda7d44c8c9949280c9a80a2d2b41c4ccbcba3e / Rahul Vats 
<[email protected]>
Add configurable LRU+TTL caching for API server DAG retrieval (#60804) (#66862)

Fixes memory growth in long-running API servers by adding bounded LRU+TTL 
caching to `DBDagBag`. Previously, the internal dict cache never expired and 
never evicted, causing memory to grow indefinitely as DAG versions accumulated 
(~500 MB/day with 100+ DAGs updating daily).

Two new `[api]` config options control caching:

| Config | Default | Description |
|--------|---------|-------------|
| `dag_cache_size` | `64` | Max cached DAG versions (0 = unbounded dict, no 
eviction) |
| `dag_cache_ttl` | `3600` | TTL in seconds (0 = LRU only, no time-based 
expiry) |

**API server only.** The scheduler continues using a plain unbounded dict with 
zero lock overhead (`nullcontext` instead of `RLock`). The bounded cache + lock 
is only created when `cache_size > 0`.

**Cache thrashing prevention.** `iter_all_latest_version_dags()` (used by the 
DAG listing endpoint) bypasses the cache entirely. Without this, every DAG 
listing request would flush the hot working set and replace it with a full scan 
of all DAGs.

**Double-checked locking.** When multiple threads miss on the same `version_id` 
concurrently, only the first thread queries the DB. The rest find it cached 
after acquiring the lock. Metrics are emitted correctly: a single lookup never 
counts as both a hit and a miss.

**Separate model cache.** `get_serialized_dag_model()` maintains its own dict 
cache. The triggerer needs the full `SerializedDagModel` (for `.data`), not the 
deserialized `SerializedDAG` stored in the LRU/TTL cache.

**Cache keying.** The cache is keyed by DAG version ID. Lookups by `dag_id` 
(e.g., viewing a DAG's details) always query the DB for the latest version, but 
the deserialized result is cached for subsequent version-specific lookups 
(e.g., task instance views for a specific DAG run).

**Staleness.** After a DAG is updated, the API server may serve the previous 
version until the cached entry expires (controlled by `dag_cache_ttl`). This is 
documented in the config description.

**Why `cachetools`.** `cachetools` is a small, pure-Python library (~1K LOC) 
already present as a transitive dependency via `google-auth`. It provides 
battle-tested `LRUCache` and `TTLCache` implementations. Pinned at `>=6.0.0` to 
match the FAB provider.

**Why `RLock`.** `cachetools` caches are NOT thread-safe -- `.get()` mutates 
internal doubly-linked lists (LRU reordering) and TTL access triggers cleanup. 
Without synchronization, concurrent access can corrupt the data structure.

| Metric | Type | Description |
|--------|------|-------------|
| `api_server.dag_bag.cache_hit` | Counter | Cache hits (including 
double-checked locking hits) |
| `api_server.dag_bag.cache_miss` | Counter | Confirmed misses (after 
double-check) |
| `api_server.dag_bag.cache_clear` | Counter | Cache clears |
| `api_server.dag_bag.cache_size` | Gauge | Current cache size (sampled at 10%) 
|

- Default behavior unchanged for scheduler and triggerer (unbounded dict, no 
lock)
- API server gets caching by default (`dag_cache_size=64`, `dag_cache_ttl=3600`)
- Use `dag_cache_size=0` to restore pre-change behavior (unbounded dict)
- No breaking changes to public APIs; `get_serialized_dag_model()` and 
`get_dag()` signatures preserved

- #64326 (closed) -- similar fix with OrderedDict-based LRU, no TTL
- #60940 (merged) -- gunicorn support with rolling worker restarts 
(complementary, handles memory growth from any source)

(cherry picked from commit 26cbdcbe948c105322fee64064b24697f03b9dc1)

Co-authored-by: Kaxil Naik <[email protected]>

Report URL: https://github.com/apache/airflow/actions/runs/25920089914

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to