phongn opened a new pull request, #13233:
URL: https://github.com/apache/trafficserver/pull/13233

   ## Summary
   
   The CLFUS RAM cache ranks objects by a value density, `value = (hits + 1) / 
(size + overhead)`, compared against a running `double` average on every 
eviction/admission decision. Since #11733 this has been computed with 
**integer** division and truncates to `0` for essentially every object, 
silently disabling CLFUS's frequency/size/recency logic and degrading it to 
FIFO.
   
   ## Root cause
   
   #11733 ("Fix C-style cast in Cache subsystem") modernized the macro:
   
   ```diff
   -#define CACHE_VALUE_HITS_SIZE(_h, _s) ((float)((_h) + 1) / ((_s) + 
ENTRY_OVERHEAD))
   +#define CACHE_VALUE_HITS_SIZE(_h, _s) (static_cast<float>(((_h) + 1) / 
((_s) + ENTRY_OVERHEAD)))
   ```
   
   The original C cast bound only to the numerator, so the division was 
floating point. The `static_cast<float>(...)` wraps the **whole** quotient, so 
`(hits + 1) / (size + overhead)` is now integer division. `hits` is on the 
order of tens while `size + overhead` is thousands+, so the result truncates to 
`0` for any normal object.
   
   With `value ≡ 0`, `_average_value` stays `0` and every value-gated branch is 
dead (`0 > 0` is false):
   
   - no promote-to-MRU on hit (`get()`),
   - no CLOCK second-chance / requeue (`put()`),
   - no value-based re-admission from the history (ghost) list.
   
   CLFUS effectively becomes FIFO, which generally performs worse than the 
simpler LRU it is meant to beat. This has affected every CLFUS deployment since 
the regression merged on 2024-08-28.
   
   ## Fix
   
   Bind the cast to the numerator only, restoring floating-point division (the
   pre-#11733 semantics):
   
   ```c
   #define CACHE_VALUE_HITS_SIZE(_h, _s) (static_cast<float>((_h) + 1) / ((_s) 
+ ENTRY_OVERHEAD))
   ```
   
   ## Test
   
   Adds `ram_cache_clfus_value`, a focused regression test asserting the value 
metric is a non-zero fraction and is monotonic in hits and in size. It fails on 
the pre-fix macro and passes after:
   
   - buggy macro → `FAILED` ("value metric truncated to zero", etc.)
   - fixed macro → `PASSED`
   
   The existing `ram_cache` regression still passes. With the fix, CLFUS's 
size-aware behavior is restored — in the bundled `ram_cache` test at the 16 MB 
size, its variable-size hit rate rises to ~0.82, slightly ahead of LRU (~0.80) 
on the same workload.
   
   ## Notes
   
   - Severity: correctness regression affecting all users of the non-default 
`proxy.config.cache.ram_cache.algorithm = 0` (CLFUS) since #11733; a backport 
may be warranted.
   - This restores the intended algorithm. A separate, long-standing limitation 
— resident entries' hit counts are only aged on the ghost list, never while 
resident — is **not** addressed here and is tracked separately.
   
   Fixes a regression introduced in #11733.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to