This is an automated email from the ASF dual-hosted git repository.
masaori335 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/trafficserver.git
The following commit(s) were added to refs/heads/master by this push:
new edc35b31ca Add S3-FIFO RAM cache eviction algorithm
(ram_cache.algorithm = 2) (#13255)
edc35b31ca is described below
commit edc35b31ca3888483f0475d4139c8a6917ebd9fa
Author: Phong Nguyen <[email protected]>
AuthorDate: Tue Jun 23 15:36:48 2026 -0700
Add S3-FIFO RAM cache eviction algorithm (ram_cache.algorithm = 2) (#13255)
* Add S3-FIFO RAM cache eviction algorithm (ram_cache.algorithm = 2)
S3-FIFO (Yang et al., SOSP 2023) is a FIFO-based eviction policy: a
small admission queue and a main queue (a 2-bit clock), plus a ghost
queue of recently evicted keys. The small queue and ghost filter
one-hit-wonders, giving scan resistance and strong hit rates on CDN
and key-value workloads at low cost -- a hit needs no list reordering.
Selectable as ram_cache.algorithm = 2 alongside CLFUS (0) and LRU (1).
The policy is byte-budgeted; its eviction metadata (the ghost included,
bounded by object size and an entry-count cap) is accounted within
ram_cache.size so total memory stays within the configured budget. Like
LRU and CLFUS it enforces one resident copy per key (a put with a new
aux key discards the stale one) and allocates entries from a per-thread
ProxyAllocator (Thread.h).
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
* Fix ghost capacity calculation truncation order
Co-authored-by: Copilot Autofix powered by AI
<[email protected]>
* Make S3-FIFO RAM cache tunables configurable via records.yaml
The S3-FIFO queue split, ghost bounds, and promotion threshold were
compile-time constants. Expose them as
proxy.config.cache.ram_cache.s3fifo.{main_percent,ghost_size_percent,
ghost_mem_percent,promote_threshold} so operators can tune the policy
without a rebuild; the defaults match the paper and the prior behavior.
Each setting carries a RECC_INT range in RecordsConfig.cc, so an
out-of-range value is rejected at config load with a warning and the
documented default is used in its place -- the same guard every other
RAM cache record relies on, rather than a bespoke clamp. The admin
guide and the ram_cache regression test (a non-default "tuned" pass)
are updated to cover the new settings.
Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
* Add entry for S3-FIFO into NOTICE
---------
Co-authored-by: Claude Opus 4.8 (1M context) <[email protected]>
Co-authored-by: Copilot Autofix powered by AI
<[email protected]>
---
NOTICE | 9 +
doc/admin-guide/files/records.yaml.en.rst | 63 ++++-
doc/admin-guide/storage/index.en.rst | 14 +-
include/iocore/cache/Cache.h | 5 +-
include/iocore/eventsystem/Thread.h | 1 +
src/iocore/cache/CMakeLists.txt | 1 +
src/iocore/cache/Cache.cc | 70 +++--
src/iocore/cache/CacheProcessor.cc | 7 +
src/iocore/cache/CacheTest.cc | 28 +-
src/iocore/cache/P_CacheInternal.h | 4 +
src/iocore/cache/P_RamCache.h | 1 +
src/iocore/cache/RamCacheS3FIFO.cc | 437 ++++++++++++++++++++++++++++++
src/records/RecordsConfig.cc | 11 +-
13 files changed, 609 insertions(+), 42 deletions(-)
diff --git a/NOTICE b/NOTICE
index 31787fecbf..ad0c5f11b5 100644
--- a/NOTICE
+++ b/NOTICE
@@ -118,3 +118,12 @@ LS-HPACK provides functionality to encode and decode HTTP
headers using
HPACK compression mechanism specified in RFC 7541.
Copyright (c) 2018 - 2023 LiteSpeed Technologies Inc, (MIT License)
https://github.com/litespeedtech/ls-hpack.git
+
+~~
+
+S3-FIFO: A simple, scalable FIFO-based algorithm with three static queues
+Copyright 2023, Carnegie Mellon University (Apache-2.0)
+
+Website: https://s3fifo.com/
+Paper: http://dx.doi.org/10.1145/3600006.3613147
+Reference implementation: https://github.com/Thesys-lab/sosp23-s3fifo
diff --git a/doc/admin-guide/files/records.yaml.en.rst
b/doc/admin-guide/files/records.yaml.en.rst
index 8d9debca24..a554c554a2 100644
--- a/doc/admin-guide/files/records.yaml.en.rst
+++ b/doc/admin-guide/files/records.yaml.en.rst
@@ -2905,10 +2905,65 @@ RAM Cache
.. ts:cv:: CONFIG proxy.config.cache.ram_cache.algorithm INT 1
- Two distinct RAM caches are supported, the default (1) being the simpler
- **LRU** (*Least Recently Used*) cache. As an alternative, the **CLFUS**
- (*Clocked Least Frequently Used by Size*) is also available, by changing
this
- configuration to 0.
+ Three RAM cache eviction algorithms are supported, selected by this value:
+
+ ``1``
+ **LRU** (*Least Recently Used*), the default -- the simplest policy,
+ favoring recency. Pairs with
+ :ts:cv:`proxy.config.cache.ram_cache.use_seen_filter` for scan
+ resistance.
+
+ ``0``
+ **CLFUS** (*Clocked Least Frequently Used by Size*), which balances
+ recency, frequency, and object size. It is the only algorithm that
+ supports in-RAM compression
+ (:ts:cv:`proxy.config.cache.ram_cache.compress`).
+
+ ``2``
+ **S3-FIFO** (*Simple Scalable Static FIFO*): a small admission queue and
+ a main queue (both FIFO), plus a ghost queue of recently evicted keys,
+ which together filter one-hit-wonders. Scan-resistant and inexpensive
+ (no per-hit reordering); strong hit rates on CDN and key-value
+ workloads. Its eviction metadata (including the ghost) is accounted
+ within :ts:cv:`proxy.config.cache.ram_cache.size`. Experimental; it does
+ not use the seen filter or support in-RAM compression. Its queue split,
+ ghost bounds, and promotion threshold can be tuned with the
+ ``proxy.config.cache.ram_cache.s3fifo.*`` settings below; the defaults
+ follow the original paper and suit most workloads.
+
+.. ts:cv:: CONFIG proxy.config.cache.ram_cache.s3fifo.main_percent INT 90
+
+ Only applies when :ts:cv:`proxy.config.cache.ram_cache.algorithm` is ``2``
+ (S3-FIFO). The target size of the main queue as a percentage of the resident
+ budget; the remainder is the small admission queue. The default ``90`` gives
+ the ~10% small / ~90% main split from the paper. Valid range is ``1`` to
+ ``99`` (an out-of-range value is rejected with a warning and the default is
+ used); a larger value grows the main queue at the expense of the admission
+ queue.
+
+.. ts:cv:: CONFIG proxy.config.cache.ram_cache.s3fifo.ghost_size_percent INT 90
+
+ Only applies when :ts:cv:`proxy.config.cache.ram_cache.algorithm` is ``2``
+ (S3-FIFO). The ghost queue remembers the keys of recently evicted objects
for
+ up to this percentage of :ts:cv:`proxy.config.cache.ram_cache.size` worth of
+ object bytes. Valid range is ``0`` to ``100``; ``0`` disables this bound.
+
+.. ts:cv:: CONFIG proxy.config.cache.ram_cache.s3fifo.ghost_mem_percent INT 25
+
+ Only applies when :ts:cv:`proxy.config.cache.ram_cache.algorithm` is ``2``
+ (S3-FIFO). Caps the memory the ghost queue's per-key metadata may consume at
+ this percentage of :ts:cv:`proxy.config.cache.ram_cache.size`. This metadata
+ is counted against the configured RAM cache size, so total memory stays
+ within the budget regardless of object cardinality. Valid range is ``0`` to
+ ``100``; ``0`` disables the ghost queue.
+
+.. ts:cv:: CONFIG proxy.config.cache.ram_cache.s3fifo.promote_threshold INT 2
+
+ Only applies when :ts:cv:`proxy.config.cache.ram_cache.algorithm` is ``2``
+ (S3-FIFO). The number of times an object in the small admission queue must
be
+ reused before it is promoted to the main queue instead of being demoted to
+ the ghost. The default ``2`` admits objects seen at least twice. Valid range
+ is ``1`` to ``3``; a higher value makes admission to the main queue
stricter.
.. ts:cv:: CONFIG proxy.config.cache.ram_cache.use_seen_filter INT 1
diff --git a/doc/admin-guide/storage/index.en.rst
b/doc/admin-guide/storage/index.en.rst
index f68808b101..aa3754df5e 100644
--- a/doc/admin-guide/storage/index.en.rst
+++ b/doc/admin-guide/storage/index.en.rst
@@ -75,18 +75,22 @@ and reduces load on disks, especially during temporary
traffic peaks.
You can configure the RAM cache size to suit your needs, as described in
:ref:`changing-the-size-of-the-ram-cache` below.
-The RAM cache supports two cache eviction algorithms, a regular *LRU*
-(Least Recently Used) and the more advanced *CLFUS* (Clocked Least
+The RAM cache supports three cache eviction algorithms: a regular *LRU*
+(Least Recently Used); the more advanced *CLFUS* (Clocked Least
Frequently Used by Size; which balances recentness, frequency, and size
-to maximize hit rate, similar to a most frequently used algorithm).
-The default is to use *LRU*, and this is controlled via
+to maximize hit rate, similar to a most frequently used algorithm); and
+*S3-FIFO* (Simple Scalable Static FIFO), a FIFO-based policy whose small
+admission queue and ghost list filter one-hit-wonders, giving strong hit
+rates on CDN and key-value workloads at low cost. The default is to use
+*LRU*, and this is controlled via
:ts:cv:`proxy.config.cache.ram_cache.algorithm`.
Both the *LRU* and *CLFUS* RAM caches support a configuration to increase
scan resistance. In a typical *LRU*, if you request all possible objects in
sequence, you will effectively churn the cache on every request. The option
:ts:cv:`proxy.config.cache.ram_cache.use_seen_filter` can be set to add some
-resistance against this problem.
+resistance against this problem. *S3-FIFO* is scan-resistant by design,
+through its admission queue, and does not use the seen filter.
In addition, *CLFUS* also supports compressing in the RAM cache itself.
This can be useful for content which is not compressed by itself (e.g.
diff --git a/include/iocore/cache/Cache.h b/include/iocore/cache/Cache.h
index 3a7da523b5..9a0fe5d430 100644
--- a/include/iocore/cache/Cache.h
+++ b/include/iocore/cache/Cache.h
@@ -36,8 +36,9 @@ static constexpr ts::ModuleVersion CACHE_MODULE_VERSION(1, 0);
#define SCAN_KB_PER_SECOND 8192 // 1TB/8MB = 131072 = 36 HOURS to scan a TB
-#define RAM_CACHE_ALGORITHM_CLFUS 0
-#define RAM_CACHE_ALGORITHM_LRU 1
+#define RAM_CACHE_ALGORITHM_CLFUS 0
+#define RAM_CACHE_ALGORITHM_LRU 1
+#define RAM_CACHE_ALGORITHM_S3FIFO 2
#define CACHE_COMPRESSION_NONE 0
#define CACHE_COMPRESSION_FASTLZ 1
diff --git a/include/iocore/eventsystem/Thread.h
b/include/iocore/eventsystem/Thread.h
index 0a34dd633c..435762e71e 100644
--- a/include/iocore/eventsystem/Thread.h
+++ b/include/iocore/eventsystem/Thread.h
@@ -134,6 +134,7 @@ public:
ProxyAllocator openDirEntryAllocator;
ProxyAllocator ramCacheCLFUSEntryAllocator;
ProxyAllocator ramCacheLRUEntryAllocator;
+ ProxyAllocator ramCacheS3FIFOEntryAllocator;
ProxyAllocator evacuationBlockAllocator;
ProxyAllocator ioDataAllocator;
ProxyAllocator ioAllocator;
diff --git a/src/iocore/cache/CMakeLists.txt b/src/iocore/cache/CMakeLists.txt
index f8b3817f72..f8a252b430 100644
--- a/src/iocore/cache/CMakeLists.txt
+++ b/src/iocore/cache/CMakeLists.txt
@@ -33,6 +33,7 @@ add_library(
PreservationTable.cc
RamCacheCLFUS.cc
RamCacheLRU.cc
+ RamCacheS3FIFO.cc
Store.cc
Stripe.cc
StripeSM.cc
diff --git a/src/iocore/cache/Cache.cc b/src/iocore/cache/Cache.cc
index 69692d890d..d5dcf3df76 100644
--- a/src/iocore/cache/Cache.cc
+++ b/src/iocore/cache/Cache.cc
@@ -56,35 +56,39 @@ constexpr ts::VersionNumber
CACHE_DB_VERSION(CACHE_DB_MAJOR_VERSION, CACHE_DB_MI
// Configuration
-int64_t cache_config_ram_cache_size = AUTO_SIZE_RAM_CACHE;
-int cache_config_ram_cache_algorithm = 1;
-int cache_config_ram_cache_compress = 0;
-int cache_config_ram_cache_compress_percent = 90;
-int cache_config_ram_cache_use_seen_filter = 1;
-int cache_config_http_max_alts = 3;
-int cache_config_log_alternate_eviction = 0;
-int cache_config_dir_sync_frequency = 60;
-int cache_config_dir_sync_delay = 500;
-int cache_config_dir_sync_max_write = (2 * 1024 * 1024);
-int cache_config_dir_sync_parallel_tasks = 1;
-int cache_config_permit_pinning = 0;
-int cache_config_select_alternate = 1;
-int cache_config_max_doc_size = 0;
-int cache_config_min_average_object_size = ESTIMATED_OBJECT_SIZE;
-int64_t cache_config_ram_cache_cutoff = AGG_SIZE;
-int cache_config_max_disk_errors = 5;
-int cache_config_hit_evacuate_percent = 10;
-int cache_config_hit_evacuate_size_limit = 0;
-int cache_config_force_sector_size = 0;
-int cache_config_target_fragment_size =
DEFAULT_TARGET_FRAGMENT_SIZE;
-int cache_config_agg_write_backlog = AGG_SIZE * 2;
-int cache_config_enable_checksum = 0;
-int cache_config_alt_rewrite_max_size = 4096;
-int cache_config_read_while_writer = 0;
-int cache_config_mutex_retry_delay = 2;
-int cache_read_while_writer_retry_delay = 50;
-int cache_config_read_while_writer_max_retries = 10;
-int cache_config_persist_bad_disks = false;
+int64_t cache_config_ram_cache_size = AUTO_SIZE_RAM_CACHE;
+int cache_config_ram_cache_algorithm = 1;
+int cache_config_ram_cache_compress = 0;
+int cache_config_ram_cache_compress_percent = 90;
+int cache_config_ram_cache_use_seen_filter = 1;
+int cache_config_ram_cache_s3fifo_main_percent = 90;
+int cache_config_ram_cache_s3fifo_ghost_size_percent = 90;
+int cache_config_ram_cache_s3fifo_ghost_mem_percent = 25;
+int cache_config_ram_cache_s3fifo_promote_threshold = 2;
+int cache_config_http_max_alts = 3;
+int cache_config_log_alternate_eviction = 0;
+int cache_config_dir_sync_frequency = 60;
+int cache_config_dir_sync_delay = 500;
+int cache_config_dir_sync_max_write = (2 * 1024 * 1024);
+int cache_config_dir_sync_parallel_tasks = 1;
+int cache_config_permit_pinning = 0;
+int cache_config_select_alternate = 1;
+int cache_config_max_doc_size = 0;
+int cache_config_min_average_object_size =
ESTIMATED_OBJECT_SIZE;
+int64_t cache_config_ram_cache_cutoff = AGG_SIZE;
+int cache_config_max_disk_errors = 5;
+int cache_config_hit_evacuate_percent = 10;
+int cache_config_hit_evacuate_size_limit = 0;
+int cache_config_force_sector_size = 0;
+int cache_config_target_fragment_size =
DEFAULT_TARGET_FRAGMENT_SIZE;
+int cache_config_agg_write_backlog = AGG_SIZE * 2;
+int cache_config_enable_checksum = 0;
+int cache_config_alt_rewrite_max_size = 4096;
+int cache_config_read_while_writer = 0;
+int cache_config_mutex_retry_delay = 2;
+int cache_read_while_writer_retry_delay = 50;
+int cache_config_read_while_writer_max_retries = 10;
+int cache_config_persist_bad_disks = false;
// Globals
@@ -865,6 +869,14 @@ ink_cache_init(ts::ModuleVersion v)
RecEstablishStaticConfigInt32(cache_config_ram_cache_compress_percent,
"proxy.config.cache.ram_cache.compress_percent");
cache_config_ram_cache_use_seen_filter =
RecGetRecordInt("proxy.config.cache.ram_cache.use_seen_filter").value_or(0);
+ RecEstablishStaticConfigInt32(cache_config_ram_cache_s3fifo_main_percent,
"proxy.config.cache.ram_cache.s3fifo.main_percent");
+
RecEstablishStaticConfigInt32(cache_config_ram_cache_s3fifo_ghost_size_percent,
+
"proxy.config.cache.ram_cache.s3fifo.ghost_size_percent");
+
RecEstablishStaticConfigInt32(cache_config_ram_cache_s3fifo_ghost_mem_percent,
+
"proxy.config.cache.ram_cache.s3fifo.ghost_mem_percent");
+
RecEstablishStaticConfigInt32(cache_config_ram_cache_s3fifo_promote_threshold,
+
"proxy.config.cache.ram_cache.s3fifo.promote_threshold");
+
RecEstablishStaticConfigInt32(cache_config_http_max_alts,
"proxy.config.cache.limits.http.max_alts");
Dbg(dbg_ctl_cache_init, "proxy.config.cache.limits.http.max_alts = %d",
cache_config_http_max_alts);
diff --git a/src/iocore/cache/CacheProcessor.cc
b/src/iocore/cache/CacheProcessor.cc
index b2e874b8dc..1fa158d988 100644
--- a/src/iocore/cache/CacheProcessor.cc
+++ b/src/iocore/cache/CacheProcessor.cc
@@ -1501,6 +1501,10 @@ CacheProcessor::cacheInitialized()
if (gnstripes) {
// new ram_caches, with algorithm from the config
+ static const char *const ram_cache_algorithm_name[] = {"CLFUS", "LRU",
"S3-FIFO"};
+ int const ram_alg =
cache_config_ram_cache_algorithm;
+ Dbg(dbg_ctl_cache_init, "ram_cache algorithm = %d (%s)", ram_alg,
+ (ram_alg >= 0 && ram_alg <= RAM_CACHE_ALGORITHM_S3FIFO) ?
ram_cache_algorithm_name[ram_alg] : "unknown");
for (int i = 0; i < gnstripes; i++) {
switch (cache_config_ram_cache_algorithm) {
default:
@@ -1510,6 +1514,9 @@ CacheProcessor::cacheInitialized()
case RAM_CACHE_ALGORITHM_LRU:
gstripes[i]->ram_cache = new_RamCacheLRU();
break;
+ case RAM_CACHE_ALGORITHM_S3FIFO:
+ gstripes[i]->ram_cache = new_RamCacheS3FIFO();
+ break;
}
}
diff --git a/src/iocore/cache/CacheTest.cc b/src/iocore/cache/CacheTest.cc
index cd86df6051..3d39abe8cc 100644
--- a/src/iocore/cache/CacheTest.cc
+++ b/src/iocore/cache/CacheTest.cc
@@ -679,8 +679,34 @@ REGRESSION_TEST(ram_cache)(RegressionTest *t, int level,
int *pstatus)
for (int s = 20; s <= 24; s += 4) {
int64_t cache_size = 1LL << s;
*pstatus = REGRESSION_TEST_PASSED;
- if (!test_RamCache(t, new_RamCacheLRU(), "LRU", cache_size) ||
!test_RamCache(t, new_RamCacheCLFUS(), "CLFUS", cache_size)) {
+ if (!test_RamCache(t, new_RamCacheLRU(), "LRU", cache_size) ||
!test_RamCache(t, new_RamCacheCLFUS(), "CLFUS", cache_size) ||
+ !test_RamCache(t, new_RamCacheS3FIFO(), "S3-FIFO", cache_size)) {
*pstatus = REGRESSION_TEST_FAILED;
}
}
+
+ // Exercise the S3-FIFO tunables with valid non-default values: the policy
must still pass the
+ // hit-rate floor and the size invariant, proving the records -> init()
config plumbing is wired
+ // and that a non-default queue split, ghost bound, and promotion threshold
remain correct.
+ // (Out-of-range values are rejected by the records.yaml RECC_INT
validation, not here.)
+ {
+ int const saved_main = cache_config_ram_cache_s3fifo_main_percent;
+ int const saved_gsize =
cache_config_ram_cache_s3fifo_ghost_size_percent;
+ int const saved_gmem =
cache_config_ram_cache_s3fifo_ghost_mem_percent;
+ int const saved_promote =
cache_config_ram_cache_s3fifo_promote_threshold;
+ int64_t const cache_size = 1LL << 24;
+
+ cache_config_ram_cache_s3fifo_main_percent = 80; // valid,
non-default
+ cache_config_ram_cache_s3fifo_ghost_size_percent = 50;
+ cache_config_ram_cache_s3fifo_ghost_mem_percent = 15;
+ cache_config_ram_cache_s3fifo_promote_threshold = 1;
+ if (!test_RamCache(t, new_RamCacheS3FIFO(), "S3-FIFO tuned", cache_size)) {
+ *pstatus = REGRESSION_TEST_FAILED;
+ }
+
+ cache_config_ram_cache_s3fifo_main_percent = saved_main;
+ cache_config_ram_cache_s3fifo_ghost_size_percent = saved_gsize;
+ cache_config_ram_cache_s3fifo_ghost_mem_percent = saved_gmem;
+ cache_config_ram_cache_s3fifo_promote_threshold = saved_promote;
+ }
}
diff --git a/src/iocore/cache/P_CacheInternal.h
b/src/iocore/cache/P_CacheInternal.h
index 08e77291bb..c030fe211d 100644
--- a/src/iocore/cache/P_CacheInternal.h
+++ b/src/iocore/cache/P_CacheInternal.h
@@ -130,6 +130,10 @@ extern int cache_config_agg_write_backlog;
extern int cache_config_ram_cache_compress;
extern int cache_config_ram_cache_compress_percent;
extern int cache_config_ram_cache_use_seen_filter;
+extern int cache_config_ram_cache_s3fifo_main_percent;
+extern int cache_config_ram_cache_s3fifo_ghost_size_percent;
+extern int cache_config_ram_cache_s3fifo_ghost_mem_percent;
+extern int cache_config_ram_cache_s3fifo_promote_threshold;
extern int cache_config_hit_evacuate_percent;
extern int cache_config_hit_evacuate_size_limit;
extern int cache_config_force_sector_size;
diff --git a/src/iocore/cache/P_RamCache.h b/src/iocore/cache/P_RamCache.h
index b4833e6d96..1afe103751 100644
--- a/src/iocore/cache/P_RamCache.h
+++ b/src/iocore/cache/P_RamCache.h
@@ -45,3 +45,4 @@ public:
RamCache *new_RamCacheLRU();
RamCache *new_RamCacheCLFUS();
+RamCache *new_RamCacheS3FIFO();
diff --git a/src/iocore/cache/RamCacheS3FIFO.cc
b/src/iocore/cache/RamCacheS3FIFO.cc
new file mode 100644
index 0000000000..7471165107
--- /dev/null
+++ b/src/iocore/cache/RamCacheS3FIFO.cc
@@ -0,0 +1,437 @@
+/** @file
+
+ A brief file description
+
+ @section license License
+
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+ */
+
+// S3-FIFO RAM cache replacement policy.
+//
+// The design follows Yang, Zhang, Qiu, Yue & Rashmi, "FIFO queues are all you
need for cache
+// eviction" (SOSP 2023) and https://s3fifo.com, mirroring the libCacheSim
reference (S3FIFO.c).
+//
+// EXPERIMENTAL: three FIFO queues -- a small admission queue S (~10% of
capacity), a main queue M
+// (~90%, a 2-bit CLOCK), and a ghost queue G that remembers the keys of
recently evicted objects
+// (no data). New objects enter S; on eviction from S an object that was
reused (frequency >= 2) is
+// promoted to M, otherwise it is demoted to G. A subsequent miss whose key is
in G enters M
+// directly. The small queue + ghost give S3-FIFO admission control that
filters one-hit-wonders --
+// exactly the CDN "quick demotion" property -- at FIFO cost (no per-hit
reordering). The ghost
+// holds keys, so it costs per-entry metadata at high object cardinality; that
metadata is bounded
+// and counted against ram_cache.size so total memory stays within the
configured budget. Selectable
+// as proxy.config.cache.ram_cache.algorithm = 2; the queue split, ghost
bounds, and promotion
+// threshold are tunable via proxy.config.cache.ram_cache.s3fifo.* (see init()
and the admin guide,
+// records.yaml.en.rst).
+
+#include "P_RamCache.h"
+#include "P_CacheInternal.h"
+#include "StripeSM.h"
+#include "iocore/eventsystem/IOBuffer.h"
+#include "tscore/CryptoHash.h"
+#include "tscore/List.h"
+
+#define ENTRY_OVERHEAD 128 // per-entry metadata counted against ram_cache.size
+#define FREQ_MAX 3 // 2-bit saturating frequency counter; keep
ram_cache.s3fifo.promote_threshold's range in sync
+
+namespace
+{
+DbgCtl dbg_ctl_ram_cache{"ram_cache"};
+} // namespace
+
+enum { SEG_SMALL = 0, SEG_MAIN = 1, SEG_GHOST = 2 };
+
+struct RamCacheS3FIFOEntry {
+ CryptoHash key;
+ uint64_t auxkey;
+ uint32_t size; // object bytes; resident entries account size +
ENTRY_OVERHEAD against the budget
+ uint8_t seg; // SEG_SMALL / SEG_MAIN / SEG_GHOST
+ uint8_t freq; // 0..FREQ_MAX
+ LINK(RamCacheS3FIFOEntry, lru_link);
+ LINK(RamCacheS3FIFOEntry, hash_link);
+ Ptr<IOBufferData> data; // null for ghost entries
+};
+
+struct RamCacheS3FIFO : public RamCache {
+ int get(CryptoHash *key, Ptr<IOBufferData> *ret_data, uint64_t auxkey =
0) override;
+ int put(CryptoHash *key, IOBufferData *data, uint32_t len, bool copy =
false, uint64_t auxkey = 0) override;
+ int fixup(const CryptoHash *key, uint64_t old_auxkey, uint64_t
new_auxkey) override;
+ int64_t size() const override;
+ void init(int64_t max_bytes, StripeSM *stripe) override;
+
+private:
+ int _main_percent = 90; // main queue target size, percent of the
resident budget (configurable)
+ int _promote_threshold = 2; // reuse count in the small queue that
promotes an object to main (configurable)
+ int64_t _max_bytes = 0;
+ int64_t _ghost_size_limit = 0; // ghost object-size bound (keeps it
proportional for big objects)
+ int64_t _ghost_max = 0; // ghost entry-count bound (caps metadata to
ghost_mem_percent)
+ int64_t _s_bytes = 0; // resident bytes in the small queue (incl.
ENTRY_OVERHEAD per entry)
+ int64_t _m_bytes = 0; // resident bytes in the main queue
+ int64_t _g_bytes = 0; // ghost object-size sum (vs
_ghost_size_limit)
+ int64_t _g_count = 0; // ghost entries; each costs
~ENTRY_OVERHEAD, counted in the budget
+ int64_t _objects = 0; // resident objects (S + M)
+ int64_t _nentries = 0; // hash entries (resident + ghost), for
sizing the hash table
+
+ Que(RamCacheS3FIFOEntry, lru_link) _seg[3]; // head = oldest, tail = newest,
per segment
+ DList(RamCacheS3FIFOEntry, hash_link) *_bucket = nullptr;
+ int _nbuckets = 0;
+ int _ibuckets = 0;
+ StripeSM *_stripe = nullptr;
+
+ void _resize_hashtable();
+ RamCacheS3FIFOEntry *_lookup(const CryptoHash *key, uint64_t auxkey);
+ void _unlink_hash(RamCacheS3FIFOEntry *e);
+ void _remove(RamCacheS3FIFOEntry *e);
+ void _to_main(RamCacheS3FIFOEntry *e);
+ void _to_ghost(RamCacheS3FIFOEntry *e);
+ bool _evict_ghost_one();
+ void _enforce_ghost();
+ void _evict_small();
+ void _evict_main();
+ void _evict();
+};
+
+ClassAllocator<RamCacheS3FIFOEntry, false>
ramCacheS3FIFOEntryAllocator("RamCacheS3FIFOEntry");
+
+static const int bucket_sizes[] = {8191, 16381, 32749, 65521,
131071, 262139, 524287, 1048573, 2097143,
+ 4194301, 8388593, 16777213, 33554393,
67108859, 134217689, 268435399, 536870909, 1073741827};
+
+void
+RamCacheS3FIFO::_resize_hashtable()
+{
+ ink_release_assert(_ibuckets < static_cast<int>(sizeof(bucket_sizes) /
sizeof(bucket_sizes[0])));
+ int anbuckets = bucket_sizes[_ibuckets];
+ int64_t s = anbuckets * sizeof(DList(RamCacheS3FIFOEntry,
hash_link));
+
+ DList(RamCacheS3FIFOEntry, hash_link) *new_bucket =
static_cast<DList(RamCacheS3FIFOEntry, hash_link) *>(ats_malloc(s));
+ memset(static_cast<void *>(new_bucket), 0, s);
+ if (_bucket) {
+ for (int64_t i = 0; i < _nbuckets; i++) {
+ RamCacheS3FIFOEntry *e = nullptr;
+ while ((e = _bucket[i].pop())) {
+ new_bucket[e->key.slice32(3) % anbuckets].push(e);
+ }
+ }
+ ats_free(_bucket);
+ }
+ _bucket = new_bucket;
+ _nbuckets = anbuckets;
+}
+
+void
+RamCacheS3FIFO::init(int64_t abytes, StripeSM *astripe)
+{
+ _stripe = astripe;
+ _max_bytes = abytes;
+ if (!_max_bytes) {
+ return;
+ }
+
+ // The tunables are range-validated in records.yaml (RECC_INT): an
out-of-range value is rejected
+ // there with a warning and the documented default is used in its place, so
the values read here
+ // are already within their safe ranges -- no clamping needed.
+ _main_percent = cache_config_ram_cache_s3fifo_main_percent;
+ _promote_threshold = cache_config_ram_cache_s3fifo_promote_threshold;
+ int ghost_size_percent = cache_config_ram_cache_s3fifo_ghost_size_percent;
+ int ghost_mem_percent = cache_config_ram_cache_s3fifo_ghost_mem_percent;
+
+ _ghost_size_limit = (_max_bytes * ghost_size_percent) / 100;
+ // The ghost stores keys only, but each key still costs ~ENTRY_OVERHEAD of
real memory. Bound the
+ // ghost by both its object-size sum (keeps it proportional for large
objects) and an entry count
+ // that caps its metadata at ghost_mem_percent of the configured size; the
metadata is counted
+ // against the budget (see put) so total memory never exceeds ram_cache.size.
+ _ghost_max = ((_max_bytes * ghost_mem_percent) / 100) / ENTRY_OVERHEAD;
+
+ Dbg(dbg_ctl_ram_cache,
+ "S3-FIFO init %" PRId64 " bytes: main_percent=%d ghost_size_percent=%d
ghost_mem_percent=%d promote_threshold=%d", _max_bytes,
+ _main_percent, ghost_size_percent, ghost_mem_percent,
_promote_threshold);
+
+ _resize_hashtable();
+}
+
+int64_t
+RamCacheS3FIFO::size() const
+{
+ // Memory accounted against ram_cache.size: resident data plus per-entry
overhead, plus the
+ // ghost's per-key overhead. This matches the budget enforced in put(), and
is O(1).
+ return _s_bytes + _m_bytes + _g_count * ENTRY_OVERHEAD;
+}
+
+RamCacheS3FIFOEntry *
+RamCacheS3FIFO::_lookup(const CryptoHash *key, uint64_t auxkey)
+{
+ uint32_t i = key->slice32(3) % _nbuckets;
+ RamCacheS3FIFOEntry *e = _bucket[i].head;
+ while (e) {
+ if (e->key == *key && e->auxkey == auxkey) {
+ return e;
+ }
+ e = e->hash_link.next;
+ }
+ return nullptr;
+}
+
+void
+RamCacheS3FIFO::_unlink_hash(RamCacheS3FIFOEntry *e)
+{
+ uint32_t b = e->key.slice32(3) % _nbuckets;
+ _bucket[b].remove(e);
+}
+
+// Remove an entry from whichever segment holds it, update all accounting, and
free it. The single
+// removal point used by eviction, ghost trimming, and stale-aux-key discard
in put().
+void
+RamCacheS3FIFO::_remove(RamCacheS3FIFOEntry *e)
+{
+ _seg[e->seg].remove(e);
+ if (e->seg == SEG_GHOST) {
+ _g_bytes -= e->size;
+ _g_count--;
+ } else {
+ int64_t resident = ENTRY_OVERHEAD + e->size;
+ if (e->seg == SEG_SMALL) {
+ _s_bytes -= resident;
+ } else {
+ _m_bytes -= resident;
+ }
+ ts::Metrics::Gauge::decrement(cache_rsb.ram_cache_bytes, resident);
+ ts::Metrics::Gauge::decrement(_stripe->cache_vol->vol_rsb.ram_cache_bytes,
resident);
+ _objects--;
+ }
+ _unlink_hash(e);
+ _nentries--;
+ e->data = nullptr;
+ THREAD_FREE(e, ramCacheS3FIFOEntryAllocator, this_thread());
+}
+
+// Promote a small-queue object that has proven popular into the main queue
(resident, no data
+// change), resetting its frequency for the main-queue CLOCK.
+void
+RamCacheS3FIFO::_to_main(RamCacheS3FIFOEntry *e)
+{
+ _seg[SEG_SMALL].remove(e);
+ _s_bytes -= ENTRY_OVERHEAD + e->size;
+ e->seg = SEG_MAIN;
+ e->freq = 0;
+ _seg[SEG_MAIN].enqueue(e);
+ _m_bytes += ENTRY_OVERHEAD + e->size;
+}
+
+// Demote a small-queue object to the ghost queue: drop its data (freeing
resident bytes) but keep
+// its key so a quick re-reference can be admitted straight to the main queue.
+void
+RamCacheS3FIFO::_to_ghost(RamCacheS3FIFOEntry *e)
+{
+ _seg[SEG_SMALL].remove(e);
+ _s_bytes -= ENTRY_OVERHEAD + e->size;
+ ts::Metrics::Gauge::decrement(cache_rsb.ram_cache_bytes, ENTRY_OVERHEAD +
e->size);
+ ts::Metrics::Gauge::decrement(_stripe->cache_vol->vol_rsb.ram_cache_bytes,
ENTRY_OVERHEAD + e->size);
+ e->data = nullptr;
+ e->seg = SEG_GHOST;
+ e->freq = 0;
+ _seg[SEG_GHOST].enqueue(e);
+ _g_bytes += e->size;
+ _g_count++;
+ _objects--;
+ _enforce_ghost();
+}
+
+// Drop the oldest ghost key; returns false if the ghost was already empty.
+bool
+RamCacheS3FIFO::_evict_ghost_one()
+{
+ RamCacheS3FIFOEntry *g = _seg[SEG_GHOST].head;
+ if (!g) {
+ return false;
+ }
+ _remove(g);
+ return true;
+}
+
+void
+RamCacheS3FIFO::_enforce_ghost()
+{
+ while ((_g_bytes > _ghost_size_limit || _g_count > _ghost_max) &&
_evict_ghost_one()) {}
+}
+
+// Evict from the small queue: promote reused objects to main, demote the
first un-reused object to
+// the ghost (which is what actually frees resident bytes). If everything got
promoted the small
+// queue empties and the caller falls through to evicting main.
+void
+RamCacheS3FIFO::_evict_small()
+{
+ while (_seg[SEG_SMALL].head) {
+ RamCacheS3FIFOEntry *c = _seg[SEG_SMALL].head;
+ if (c->freq >= _promote_threshold) {
+ _to_main(c);
+ } else {
+ _to_ghost(c);
+ return;
+ }
+ }
+}
+
+// Evict from the main queue (2-bit CLOCK): a reused object is reinserted at
the tail with its
+// frequency decremented; the first object with frequency 0 is evicted.
+void
+RamCacheS3FIFO::_evict_main()
+{
+ while (_seg[SEG_MAIN].head) {
+ RamCacheS3FIFOEntry *c = _seg[SEG_MAIN].head;
+ if (c->freq >= 1) {
+ _seg[SEG_MAIN].remove(c);
+ c->freq -= 1; // 2-bit clock: a reused object gets another pass (freq is
already <= FREQ_MAX)
+ _seg[SEG_MAIN].enqueue(c);
+ } else {
+ _remove(c);
+ return;
+ }
+ }
+}
+
+void
+RamCacheS3FIFO::_evict()
+{
+ // Main targets MAIN_PERCENT of the budget actually available to resident
data -- the ghost's
+ // metadata is reserved out of the configured size, so basing the split on
raw _max_bytes would
+ // (when the ghost is full) keep _m_bytes below the limit forever and starve
the small queue.
+ int64_t resident_budget = _max_bytes - _g_count * ENTRY_OVERHEAD;
+ if (_m_bytes > resident_budget * _main_percent / 100 || _s_bytes == 0) {
+ _evict_main();
+ } else {
+ _evict_small();
+ }
+}
+
+int
+RamCacheS3FIFO::get(CryptoHash *key, Ptr<IOBufferData> *ret_data, uint64_t
auxkey)
+{
+ if (!_max_bytes) {
+ return 0;
+ }
+ RamCacheS3FIFOEntry *e = _lookup(key, auxkey);
+ if (e && e->seg != SEG_GHOST) {
+ if (e->freq < FREQ_MAX) {
+ e->freq++;
+ }
+ (*ret_data) = e->data;
+ ts::Metrics::Counter::increment(cache_rsb.ram_cache_hits);
+
ts::Metrics::Counter::increment(_stripe->cache_vol->vol_rsb.ram_cache_hits);
+ return 1;
+ }
+ ts::Metrics::Counter::increment(cache_rsb.ram_cache_misses);
+
ts::Metrics::Counter::increment(_stripe->cache_vol->vol_rsb.ram_cache_misses);
+ return 0;
+}
+
+int
+RamCacheS3FIFO::put(CryptoHash *key, IOBufferData *data, [[maybe_unused]]
uint32_t len, bool, uint64_t auxkey)
+{
+ if (!_max_bytes) {
+ return 0;
+ }
+ uint32_t size = data->block_size();
+ uint32_t i = key->slice32(3) % _nbuckets;
+
+ // Walk the hash chain. A resident hit just counts the reference; a ghost
hit is admitted fresh to
+ // the main queue; an entry with the same key but a different aux key is
stale and is discarded --
+ // the same one-resident-copy-per-key contract LRU and CLFUS enforce.
+ bool ghost_hit = false;
+ RamCacheS3FIFOEntry *e = _bucket[i].head;
+ while (e) {
+ if (e->key == *key) {
+ if (e->auxkey == auxkey) {
+ if (e->seg != SEG_GHOST) { // already resident: count the reference
(matches LRU's bump)
+ if (e->freq < FREQ_MAX) {
+ e->freq++;
+ }
+ return 1;
+ }
+ RamCacheS3FIFOEntry *next = e->hash_link.next; // ghost hit: admit a
fresh copy to main
+ _remove(e);
+ ghost_hit = true;
+ e = next;
+ continue;
+ }
+ RamCacheS3FIFOEntry *next = e->hash_link.next; // stale aux key: discard
the old copy
+ _remove(e);
+ e = next;
+ continue;
+ }
+ e = e->hash_link.next;
+ }
+
+ // Keep total memory within the configured size: resident data+overhead plus
the ghost's
+ // metadata (ENTRY_OVERHEAD per remembered key) must fit. Evict resident
first; if that is
+ // exhausted but ghost metadata still pushes over, drop ghost keys too.
+ int64_t need = ENTRY_OVERHEAD + size;
+ while (_s_bytes + _m_bytes + _g_count * ENTRY_OVERHEAD + need > _max_bytes) {
+ if (_s_bytes + _m_bytes > 0) {
+ _evict();
+ } else if (!_evict_ghost_one()) {
+ break;
+ }
+ }
+ if (_s_bytes + _m_bytes + _g_count * ENTRY_OVERHEAD + need > _max_bytes) {
+ return 0; // object larger than the whole cache: do not admit, so the
budget is never exceeded
+ }
+
+ RamCacheS3FIFOEntry *ne = THREAD_ALLOC(ramCacheS3FIFOEntryAllocator,
this_ethread());
+ ne->key = *key;
+ ne->auxkey = auxkey;
+ ne->size = size;
+ ne->freq = 0;
+ ne->seg = ghost_hit ? SEG_MAIN : SEG_SMALL;
+ ne->data = data;
+ _bucket[i].push(ne);
+ _seg[ne->seg].enqueue(ne);
+ if (ghost_hit) {
+ _m_bytes += need;
+ } else {
+ _s_bytes += need;
+ }
+ _objects++;
+ _nentries++;
+ ts::Metrics::Gauge::increment(cache_rsb.ram_cache_bytes, need);
+ ts::Metrics::Gauge::increment(_stripe->cache_vol->vol_rsb.ram_cache_bytes,
need);
+
+ if (_nentries > _nbuckets * 0.75) {
+ ++_ibuckets;
+ _resize_hashtable();
+ }
+ return 1;
+}
+
+int
+RamCacheS3FIFO::fixup(const CryptoHash *key, uint64_t old_auxkey, uint64_t
new_auxkey)
+{
+ if (!_max_bytes) {
+ return 0;
+ }
+ RamCacheS3FIFOEntry *e = _lookup(key, old_auxkey);
+ if (e && e->seg != SEG_GHOST) {
+ e->auxkey = new_auxkey;
+ return 1;
+ }
+ return 0;
+}
+
+RamCache *
+new_RamCacheS3FIFO()
+{
+ return new RamCacheS3FIFO;
+}
diff --git a/src/records/RecordsConfig.cc b/src/records/RecordsConfig.cc
index 7588e5108d..7890f427ed 100644
--- a/src/records/RecordsConfig.cc
+++ b/src/records/RecordsConfig.cc
@@ -860,7 +860,16 @@ static constexpr RecordElement RecordsConfig[] =
// # alternatively: 20971520 (20MB)
{RECT_CONFIG, "proxy.config.cache.ram_cache.size", RECD_INT, "-1",
RECU_RESTART_TS, RR_NULL, RECC_STR, "^-?[0-9]+[A-Za-z]{0,}$", RECA_NULL}
,
- {RECT_CONFIG, "proxy.config.cache.ram_cache.algorithm", RECD_INT, "1",
RECU_RESTART_TS, RR_NULL, RECC_INT, "[0-1]", RECA_NULL}
+ {RECT_CONFIG, "proxy.config.cache.ram_cache.algorithm", RECD_INT, "1",
RECU_RESTART_TS, RR_NULL, RECC_INT, "[0-2]", RECA_NULL}
+ ,
+ // # S3-FIFO (ram_cache.algorithm = 2) tunables
+ {RECT_CONFIG, "proxy.config.cache.ram_cache.s3fifo.main_percent", RECD_INT,
"90", RECU_RESTART_TS, RR_NULL, RECC_INT, "[1-99]", RECA_NULL}
+ ,
+ {RECT_CONFIG, "proxy.config.cache.ram_cache.s3fifo.ghost_size_percent",
RECD_INT, "90", RECU_RESTART_TS, RR_NULL, RECC_INT, "[0-100]", RECA_NULL}
+ ,
+ {RECT_CONFIG, "proxy.config.cache.ram_cache.s3fifo.ghost_mem_percent",
RECD_INT, "25", RECU_RESTART_TS, RR_NULL, RECC_INT, "[0-100]", RECA_NULL}
+ ,
+ {RECT_CONFIG, "proxy.config.cache.ram_cache.s3fifo.promote_threshold",
RECD_INT, "2", RECU_RESTART_TS, RR_NULL, RECC_INT, "[1-3]", RECA_NULL}
,
{RECT_CONFIG, "proxy.config.cache.ram_cache.use_seen_filter", RECD_INT, "1",
RECU_RESTART_TS, RR_NULL, RECC_INT, "[0-9]", RECA_NULL}
,