phongn opened a new pull request, #13235: URL: https://github.com/apache/trafficserver/pull/13235
## Summary Building on #13233 (which restored the CLFUS value metric), this makes the CLFUS RAM cache actually follow a working set that changes over time. Previously CLFUS captured an initial set of objects and then effectively froze on it: on a working-set change it kept serving the stale set and never admitted the new one. > Stacked on #13233 — please review/merge that first; this branch contains the > value-metric fix as its base commit. ## Root cause Two independent problems, both of which had to be fixed: 1. **Resident frequency never ages.** A resident object's `hits` only ever increased, so an object that was hot days ago kept winning replacement long after going cold. Aging existed only for the history/ghost list (`_tick()`). 2. **New candidates can't be admitted.** `_tick()` freed a history (ghost) entry the moment its aged `hits` reached 0, so the ghost list stayed ~1 entry; a re-requested key was forgotten before it could accumulate the value needed for admission, and incumbents were restored on every attempt. With the value metric fixed, this is stark: on an abrupt 100% working-set change CLFUS scored a 0.125 hit rate on the new set vs LRU's 1.0, while retaining 100% of the now-cold set. ## Fix Two small, complementary changes in `RamCacheCLFUS.cc`: 1. **Admission — keep the history list.** `_tick()` now ages the oldest ghost entry and *keeps* it, freeing only to hold the list at its target size, so a recently evicted/seen key is remembered long enough to be re-admitted. 2. **Aging — decay resident counts.** Once per "turnover" (one `Put` per resident object) `_age_resident()` halves every resident `hits` *and* `_average_value` (the admission bar must fall in step with the values it gates, or the decay is invisible to it). ## Memory Ghost entries are ~88 bytes each and are **not** counted against `proxy.config.cache.ram_cache.size`. A full cache-worth of history would be a large unbudgeted cost for caches of many small objects, so the history is bounded to `_objects / HISTORY_DIVISOR` (4). Testing showed a quarter preserves adaptivity (an eighth begins to slip); the seen-filter threshold tracks the same bound. Indicative cost for a 32 GB cache of 1 KB objects: ~700 MB, vs ~2.8 GB unbounded. ## Tests Adds two regression tests in `CacheTest.cc`, each comparing CLFUS to the LRU RAM cache (synthetic; higher is better except A-retained): | test | LRU | CLFUS before | CLFUS after | |------------------------|--------|--------------|-------------| | gradual-drift hit rate | 0.969 | 0.391 | 0.902 | | abrupt B-hit-rate | 1.000 | 0.125 | 1.000 | | abrupt A-retained | 15/112 | 112/112 | 14/112 | | steady-state 16 MB var | 0.795 | 0.790 | 0.839 | The existing `ram_cache` test still passes; CLFUS now also beats LRU on steady-state Zipfian, its intended strength. ## Docs Updates `doc/developer-guide/cache-architecture/ram-cache.en.rst`: the History List section no longer matched the code. Adds the value metric and floating admission bar, the CLOCK aging (`_tick`, `_age_resident`), "Following a shifting working set," and "Memory overhead." ## Notes - Validated on synthetic access patterns, not production traces. - A further, unimplemented lever remains if ever needed: relaxing the incumbent bias (re-queue second-chance + cost/benefit) on a detected shift — not required to pass the tests, so left out to keep the change minimal. - Possible follow-ups: budget the ghost RAM against `ram_cache.size`; expose `HISTORY_DIVISOR` as a config knob. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
