[PHP-DEV] Re: [RFC] [Discussion] OPcache Static Cache

Go Kudo Tue, 02 Jun 2026 06:15:13 -0700

2026年5月17日(日) 0:19 Go Kudo <[email protected]>:

> Hi internals,
>
> I'd like to start the discussion for a new RFC, OPcache Static Cache.
>
> RFC: https://wiki.php.net/rfc/opcache_static_cache
> Implementation: https://github.com/php/php-src/pull/22052
>
> The proposal adds an OPcache-managed shared-memory cache for explicit
> userland values and for selected PHP static state. It introduces explicit
> functions under the OPcache namespace (volatile_* and persistent_*) and two
> attributes, #[OPcache\VolatileStatic] and #[OPcache\PersistentStatic], that
> let selected static properties and method static variables survive across
> requests. The feature is disabled by default and only activates once memory
> is allocated through the new INI directives.
>
> The RFC covers the motivation, the deliberate split between the two
> backends, the trust model (one PHP runtime = one trust domain; this is not
> a tenant isolation boundary), and benchmarks against APCu on NTS php-fpm
> and ZTS FrankenPHP. The PR is the full implementation, with PHPT coverage
> summarized in the Validation section.
>
> One thing to flag on the implementation status: the Windows build is
> currently broken. I don't have a Windows development environment available
> yet — one is being arranged through work, and I'll get the Windows side
> fixed once that's in place.
>
> Feedback welcome.
>
> Best Regards,
> Go Kudo
>


 Hi Nicolas, Jakub, Timo, Larry

I update RFC and Implementation:
RFC: https://wiki.php.net/rfc/opcache_static_cache
PR: https://github.com/php/php-src/pull/22052

I'm folding replies to all three of you into one message, since the
threads overlap. Most of it answers Nicolas's measurements; further down
there is a section for Jakub's FPM pool-isolation concern and a short note
for Timo's pointer to prior art.

Nicolas, thank you for building my branch and running your own A/B/C
measurements. That moved the discussion onto concrete ground, and I
appreciate it.

Since your review I have pushed a revised branch and bumped the RFC to
2.0.0. The API changes discussed below are in it (the SAPI opt-in model,
and `getCacheStoreType()` for storage-path visibility), and the object
workloads you flagged are now substantially faster: native now beats the
deepclone path on every nested case I tried. Details and numbers follow.

I agree with most of your points. I'll go through them in order, concede
the ones where you are right, and try to narrow what is left. I think it
comes down to one question: whether a userland array-hydration layer is an
acceptable replacement for engine-level object storage. Most of the rest I
can give you.

## The resulting public API

For reference, here is the shape the explicit API settled into, summarised
from the stub:

```php
namespace OPcache;

// Explicit cache: two final classes, static methods only, no instances.
final class VolatileCache
{
    public static function get(string $key,
null|bool|int|float|string|array|object $default = null):
null|bool|int|float|string|array|object;
    public static function getMultiple(array $keys, ?array $default =
null): array|false;
    public static function set(string $key,
null|bool|int|float|string|array|object $value, int $ttl = 0): bool;
    public static function setMultiple(array $values, int $ttl = 0): bool;
    public static function has(string $key): bool;
    public static function delete(string $key_or_class): bool;
    public static function deleteMultiple(array $keys): bool;
    public static function clear(): bool;
    public static function lock(string $key, int $lease = 0): bool;
    public static function unlock(string $key): bool;
    public static function getCacheStoreType(string $key_or_property,
?string $class_name = null): CacheStoreType;
    public static function info(): StaticCacheInfo;
}

// PinnedCache is the same set, except set()/setMultiple() take no $ttl,
// plus two atomic counters:
final class PinnedCache
{
    // get/getMultiple/set/setMultiple/has/delete/deleteMultiple/clear/
    // lock/unlock/getCacheStoreType/info  -- as above
    public static function increment(string $key, int $step = 1): int|false;
    public static function decrement(string $key, int $step = 1): int|false;
}

// getCacheStoreType() reports how a value is stored, without decoding it:
enum CacheStoreType
{
    case NotFound;          // no entry for the key/property
    case Scalar;            // stored inline
    case SharedGraph;       // zero-copy graph laid out in SHM (the fast
path)
    case OPcacheSerialized; // OPcache binary serializer (SHM-safe, no
userland)
    case PHPSerialized;     // php_var_serialize() last resort
}

// Declarative static state, over the same storage:
#[Attribute] final class VolatileStatic {
    public function __construct(int $ttl = 0, CacheStrategy $strategy =
CacheStrategy::Immediate);
}
#[Attribute] final class PinnedStatic {}
enum CacheStrategy: int { case Immediate = 0; case Tracking = 1; }

// Status object and the single exception type:
final readonly class StaticCacheInfo { /* enabled, available,
configured_memory, entry_count, ... */ }
class StaticCacheException extends \Exception {}
```

Two final classes with static methods, no instances and no shared
interface. Misses and contention return the default or `false`; genuine
backend failures return `false` (or `int|false` for the atomic counters);
`Closure` and resource values are rejected with a `TypeError`; and
`StaticCacheException` is reserved for strict `#[OPcache\PinnedStatic]`
publication.

## SAPI availability: the unsafe flag is gone, opt-in instead

> these are safe SAPIs, they just don't have a scoping concept built in
> [...] enable it by default with a single default scope for those SAPIs,
> plus a clear internal API so a SAPI can define its own scoped segments

I implemented it the way you suggested. There is no longer an
`opcache.static_cache.allow_unsafe_runtime` directive and no SAPI-name
allowlist in the engine. Availability is opt-in: a SAPI, or an embedder,
calls a small internal C API, `zend_opcache_static_cache_opt_in()`, before
request handling to enable Static Cache for its runtime. That call is the
runtime declaring that a trust/storage boundary holds for the lifetime of
the shared-memory owner.

The bundled `fpm`, `cli`, `cli-server` and `phpdbg` SAPIs call it at
startup, so they are available by default. The difference from before is the
mechanism: instead of the engine guessing from the SAPI name and offering an
"unsafe" override, each runtime states that it owns a boundary. A runtime
with a real per-tenant boundary scopes it with the partition API
(`zend_opcache_static_cache_partition_create` / `_activate`, which `fpm`
already uses per pool). A runtime without one, such as a shared multi-tenant
web SAPI with no pre-request identity, never opts in and stays unavailable,
with nothing left to misconfigure.

The `embed` SAPI does not auto-opt-in, on purpose. The embedding application
owns the runtime and its trust boundary, so it opts in from its own startup
code. That keeps the rule consistent for every embedder, including one that
registers its own SAPI module instead of reusing the bundled `embed` one.
FrankenPHP does exactly that, so it opts in with the same one-line call (or
a
scoped partition when it isolates per worker); there is no `embed`
special-case that covers `php_embed` users but silently misses FrankenPHP.

That is your internal-API point, and it removes the naming question by
deleting the flag entirely. The full ext/opcache suite passes with the
directive gone.

## API shape: remember()

> I could also add VolatileCache::remember($key, $compute, $ttl = 0)
> wrapping the safe lock -> build-outside-the-lock -> store sequence

I would rather not add this one. `remember()` takes a callable, and to
actually prevent a stampede it has to hold the entry lock across the call to
`$compute()`. That means running arbitrary userland PHP while holding a
cross-process SHM lock. The callable can run unbounded, throw, fork, or
re-enter the cache, and a re-entrant `lock()` on the same key (or a key in
the same lock stripe) while the lock is held is a deadlock. The lease bounds
the duration, but not the re-entrancy and not the exception path.

Not holding the lock while computing gives no stampede protection at all; it
is then just sugar over `get()`-then-`set()` that looks atomic, which is
worse than not having it.

Since I already expose `lock()`/`unlock()` with a lease, userland can do the
safe thing itself, with the compute step outside any engine lock:

```php
if (!VolatileCache::lock($key, $lease)) {
    return VolatileCache::get($key, $default);
}
try {
    $value = $compute(); // runs outside the engine lock
    VolatileCache::set($key, $value, $ttl);
    return $value;
} finally {
    VolatileCache::unlock($key);
}
```

That keeps the closure's execution, its scope, and any exception it throws
in
userland, never inside the engine's critical section. I would rather
document
this recipe than move userland execution into the primitive. If you see a
safe construction I have missed, I will reconsider.

## References and the silent fallback

> I'd rather make it visible (surface the chosen path in info(), or in a
> debug build) than ban objects

Agreed, and that is implemented: visibility, not a ban. There is a new
introspection method on both cache classes:

```php
VolatileCache::getCacheStoreType(string $key_or_property, ?string
$class_name = null): OPcache\CacheStoreType
PinnedCache::getCacheStoreType(string $key_or_property, ?string $class_name
= null): OPcache\CacheStoreType
```

It returns an `OPcache\CacheStoreType` enum (`NotFound`, `Scalar`,
`SharedGraph`, `OPcacheSerialized`, `PHPSerialized`), so you can see per key
which path a value took, without decoding it, in any build rather than only
a
debug one. Passing `$class_name` inspects the attribute-backed
static-property storage for that class instead of an explicit key. A value
that fell back to serialization is now one call away from being observable.

The enum also pins down a correction. The first fallback off the shared
graph
is not `php_var_serialize` but the OPcache binary serializer, which is
SHM-safe and runs no userland code. That is why `getCacheStoreType` reports
`OPcacheSerialized` and `PHPSerialized` as separate cases;
`php_var_serialize`
is the last resort, not the first. So "bail == APCu parity" understates the
middle tier, though your underlying point holds: even that tier is slower
than
the fast path and should be visible.

> no real objection to rejecting top-level hard refs up front [...]
> "top-level hard ref" confuses me

You are right to be confused, and I will retract the phrase; it is a no-op.
`store($key, $value)` takes `$value` by value, so the engine dereferences
any
top-level reference (`ZVAL_DEREF`) before storage ever sees it. A top-level
hard ref cannot reach the storage layer as a reference. The case that
matters
is a nested reference, a `&` inside an array element or object property, and
that cannot be rejected cheaply up front: detecting it requires walking the
whole graph, which is the walk the shared-graph builder already does. So the
honest answer for nested refs is the visibility above (the value reports the
serialize path), not an up-front rejection.

## Scalars and arrays-of-scalars only

This is where the discussion helped most. I argued before that scalars-only
gave up a real win; you pushed back with measurements; so I built your setup
and measured it properly, including the large nested workloads that are the
actual case for a cache. You were right that native was losing. That sent me
into the implementation, and I found the cause and fixed it. The path is
worth setting out.

Two of your framings I agree with up front:

1. For array-of-scalars config/metadata, an immutable interned array is
   essentially free, and the cache should not claim to beat it.
2. The "Nx faster than APCu" headline is size-dependent; APCu is only a few
   microseconds for small payloads.

### (a) The config array

> an immutable array is essentially free (0.045 us) [...] the static
> cache's own array fetch, which pays an O(n) walk per read and so doesn't
> even deliver the immutable-array win that opcache literals already give

You are structurally right, and I have fixed it. Two facts first. I could
not
reproduce 331 us: a pure-scalar 4k-entry array fetches in about 7 us,
scaling
at roughly 1.7 ns/entry, and the decode itself was already zero-copy (a
scalar array is stored once as `IS_ARRAY_IMMUTABLE` and returned as
`ZVAL_ARR()` straight into SHM). The O(n) you felt was one layer up: every
warm fetch re-walked the array in `value_needs_request_local_clone()` to
decide whether it needed a deep clone, when that answer is fixed at store
time. I removed that walk for shared-graph values (the same change as in
(c)); the 4k fetch is now about 0.64 us and flat in the entry count.

It is still not the 0.014 us of a resident literal read, and I am not
claiming it should be. For read-only scalar config the preload/literal path
wins, and that is fine. It is a separate matter from objects.

### (b) Objects: I measured your A/B/C, found native losing, and chased why

I built this branch with APCu master and your deepclone, all NTS, JIT off,
timing warm fetches where C rebuilds the same isolated object graph B
returns
(resident dehydrated array plus `deepclone_from_array`). As you said, native
lost, and worse as the graph grew. us/op:

```
array of nested ORM entities     objects   A apcu   B native   C hydrate
                                     1000     1800        799        501
                                     2000     4171       1903       1043
object tree                          8191     1582       1736        498
                                     9841     1928       1836        523
```

Two things you were right about that I had wrong: `deepclone_to_array` /
`deepclone_from_array` are generic (no per-class hydrator to charge for),
and
C hands back the same isolated objects B does. So this was a real loss, not
a
measurement artifact.

The cause was structural, but not where I first guessed. The warm fetch kept
a request-local prototype of the materialized graph and deep-cloned it on
every repeat fetch, and for an object graph that clone is slower than
decoding
the compact SHM layout again. A shared graph never holds shared identity or
cycles, so each decode is already an independent copy; the prototype was
pure
overhead. On top of that the decoder re-resolved the class
(`zend_lookup_class`) for every object, and the builder stored a separate
copy
of each repeated class and property name.

### (c) The fix

Three changes, all behind the existing API, with no visible behaviour or
format change:

- Skip the request-local prototype for shared-graph values and decode from
  SHM on each fetch. (This also removes the O(n) array walk in (a).)
- Deduplicate equal strings within a payload at build time, so a class or
  property name repeated across thousands of objects is stored once.
- Memoize the resolved class per (buffer, offset) during a decode, so a
  homogeneous graph resolves its class once, not once per node.

Same A/B/C after the change, NTS, JIT off, us/op:

```
array of nested ORM entities     objects   A apcu   B native   C hydrate
                                     1000     1781        357        492
                                     2000     3868        721       1036
object tree                          8191     1565        462        485
                                     9841     1830        499        513
```

Native now beats deepclone on every nested workload I tried: about 1.4x on
the 2000-entity array, and the deep trees that lost 3.5x now win. The
400-object case went from 72 to 23 us. The full ext/opcache suite passes,
plus new regression tests, on NTS and ZTS.

To make this reproducible on your terms, I added a deepclone backend to my
own
HTTP benchmark harness (dehydrate with `deepclone_to_array()`, keep the
array
in the volatile cache, rehydrate with `deepclone_from_array()` on each
fetch)
and re-ran `vote_read_long` under the published conditions (php-fpm + nginx
NTS and FrankenPHP ZTS, 20 iterations / 3 warmup / 3000 ops, JIT off). The
APCu baselines match the published table within about 2%, so the runtimes
are
comparable. native vs deepclone, mean us/op (NTS):

```
workload                 APCu     native   deepclone
route_table_read        161.2      0.90      0.91     (array: tie)
large_array              90.9      0.88      0.88     (array: tie)
metadata_object_read    185.3      1.12      1.32     (native)
metadata_object_mutate  162.4      1.03      1.19     (native)
safe_direct_object        2.5      1.22      3.03     (native; deepclone
slower than APCu)
carbon_datetime_object  185.4     46.0     166.3      (native, ~3.6x)
spl_collection_object    21.0      5.48      1.89     (deepclone)
```

So under the RFC's own methodology native is faster than the deepclone path
on
every object workload except SPL collections, and ties on arrays. The SPL
case
is the one real win for deepclone, and it is specific: those classes go
through
the safe-direct serialized path, whose per-fetch copy handler is heavier
than
rebuilding from a flat array. I have noted it in the RFC as a concrete
follow-up (a tighter SPL copy handler); it does not change the overall
picture.
The updated tables are in the RFC.

Honest edges remain: for a tiny object deepclone's tight path is a hair
faster
(sub-microsecond), and for read-only scalar config a resident literal still
wins outright, as in (a). But for the workload this feature is actually for,
large nested object graphs from a database, in-engine storage is now the
faster option.

### (d) Not just performance

This does not rest on performance alone. Object support is also useful for
being built in and generic (no third-party extension, nothing to
pre-generate)
and for being one primitive: the store side and the runtime cross-worker
sharing live in the same place, instead of "cache the array" plus "hydrate
in
userland" wired together by every library. And the safe-direct registry is
not
a userland protocol: a plain user object with no magic and no cycles or refs
takes the fast path automatically via `can_restore_direct()`, and the C-only
registry only covers a few internal classes whose state the generic path
cannot read. Keeping objects imposes nothing on the ecosystem.

## Dropping pinned (and the attributes)

> PinnedStatic on the Carbon shape is ~1.5 us [...] there's no preload
> trick that reaches that number, because preload can't bake a live object
> graph into an opcode literal

Pinned is the one place a live-object representation still wins clearly,
for a
reason the volatile numbers above do not capture. Pinned (and
`#[PinnedStatic]`) materialize the graph once per worker; after that it is a
plain static read on every subsequent request in that worker, near zero per
request. The hydration approach pays its hydrate cost on every request
instead.
preload cannot reach this either: it can only intern scalar and array
literals, not bake a live object graph into an opcode literal.

The caveat is that this holds for read-only / immutable shared state, where
keeping one live instance across requests is correct; a mutable shared
instance
would leak between requests. But that is a real and common case: a compiled
DI
container, a routing table, config value objects. Your request-registry
counter
rebuilds per request from the cache, so it does not reach the per-worker
amortization, and for the read-only data where it would help, pinned already
does it with less per-request cost.

The attributes are the ergonomic surface over that same mechanism, so I
would
keep them in this RFC rather than split them out. They add no new storage
model; they remove the explicit store/fetch boilerplate for the static-state
case.

## Where this leaves us

What is already done or committed: the SAPI opt-in model (the
`allow_unsafe_runtime` flag and the SAPI allowlist are gone, replaced by the
internal opt-in/partition API); the error model; storage-path visibility via
`getCacheStoreType()`; dropping the "top-level ref" idea; the config-array
fix
(skipping the request-local prototype for shared graphs, which removes the
per-fetch array walk so a warm scalar-array fetch is zero-copy); and the
large-nested object path from (d), with numbers on this same A/B/C. I am
declining `remember()`, for the lock-safety reason above.

On the central question I went where the measurements led. You were right
that
native lost as shipped; I found why (a request-local prototype clone slower
than re-decoding, plus per-object class lookups and duplicated strings),
fixed
all three, and native now beats your deepclone path on the nested object
workloads, with the full opcache suite and new regression tests passing on
NTS
and ZTS. For tiny objects deepclone is still a hair ahead, and for read-only
scalar config a resident literal still wins; I concede both.

So I do think in-engine object storage earns its place now, on performance
and
on being a built-in, generic, single primitive (and on pinned's per-worker
amortization for read-only state). But if the body still prefers a focused
better-APCu plus a core hydration primitive, that is an outcome I can
support;
the capability matters to me more than where it sits, and the work above
transfers either way.

The revised branch is pushed and the harness is published, so you can check
the numbers directly; I will also post the full before/after A/B/C here. If
you
have a methodology you would prefer, I will run that too.

Thanks again. This got much sharper because you measured it, and it sent me
to
a fix I would not have found otherwise.

## Jakub: the FPM pool boundary is preserved

> The FPM shared hosting part is a problem [...] we consider data leaks
> between pools as security issues [...] Maybe the solution would be to
> allow it only if there is one pool enabled.

This is the concern I most wanted to get right, and I think the
implementation
answers it without the single-pool restriction. Static Cache is not one
cache
shared across pools. FPM creates a separate partition per worker pool in the
master, before any worker forks; each partition owns its own volatile and
pinned shared-memory backend, and each worker activates only its own pool's
partition during child initialization, before user code runs. Every cache
API,
status call, clear, and the Static Cache part of `opcache_reset()` operates
on
the active pool's partition. There is no API path from one pool to another
pool's data, so the pool boundary stays a security boundary and no policy
change is needed. If a pool's partition fails to start it gets no Static
Cache;
it never falls back to a shared one.

One honest caveat, for the record: the per-pool segments are anonymous
shared
mappings created in the master before fork, so a worker inherits every
pool's
segment in its address space even though it can only ever address its own
pool's partition. That is the same exposure model as the main OPcache SHM,
which is already shared across pools today; the Static Cache is in fact more
isolated, because it is logically partitioned per pool where the script
cache
is not. The data-leak-through-the-feature case you raised, one pool reading
another's cached values through the API, does not exist in this design. If
on
top of that we want address-space isolation, so a worker cannot even see
another pool's bytes, that is a worthwhile hardening (per-pool named
segments
mapped only in that pool's children, or unmapping the others post-fork),
and I
am happy to do it as a follow-up if you consider it in scope.

Your single-pool suggestion would also work, but per-pool partitions keep
the
feature usable for the multi-pool shared-hosting setups where a single-cache
design would otherwise be unacceptable.

## Timo: thanks for the immutable_cache pointer

> See also Tyson's php-immutable_cache [...] related APCu discussions

Thank you. Tyson told me about `immutable_cache` himself a while ago, and it
shaped my thinking here. I built an internal extension along the same lines,
`colopl_cache`, an APCu-style drop-in for immutable values. What that work
showed me is that the parts that matter most for this use case (OPcache
compatibility, behaviour under a JIT-heavy workload, and the Zend VM
intervention needed for static-state caching) are very hard to get right as
an
ordinary extension. That is why I brought this to OPcache as an RFC instead
of
shipping another extension: it needs cooperation from the engine, the VM,
and a
few internal classes that an extension cannot coordinate cleanly. So the
prior
art is genuinely appreciated; it is part of how I arrived here.

Best regards,
Go Kudo

[PHP-DEV] Re: [RFC] [Discussion] OPcache Static Cache

Reply via email to