On Wed, 3 Jun 2026 11:23:05 GMT, Maurizio Cimadamore <[email protected]>
wrote:
>> ## Summary
>>
>> This PR proposes to introduce a pooled confined arena as an optimization for
>> `Arena.ofConfined()`, where small native allocations can be served from a
>> reusable per-thread/per-slot memory pool instead of calling the regular
>> native allocator for every short-lived arena. The arena remains confined to
>> its owner thread and is still closed normally, but its backing storage can
>> be reset and reused when the arena closes. The feature requires no API
>> changes.
>>
>> ### Outline
>>
>> Platform threads: one lazily allocated pool per Thread, encoded in
>> `Thread.confinedMemoryPool`.
>> Virtual threads: fixed shared native pool with CAS-protected slots, because
>> per-virtual-thread native pools would not scale.
>>
>> Pooled memory is zeroed out upon _closing_ an Arena to minimize data
>> visibility between reuse. This means the data is visible only within a TWR
>> block, and never outside it.
>>
>> By default, a confined arena has access to 64 bytes of pooled data. The
>> pool size is configurable via a system property and can be 8, 16, 32, or 64
>> bytes. Pooling can also be turned off completely by setting the pool
>> power-of-two size to zero. Nested confined arenas are not supported
>>
>> ## Static Analysis
>>
>> An extensive static corpus analysis of third-party libraries and the JDK
>> itself has been conducted with respect to `Area.ofConfined()` usage,
>> revealing that confined arenas were used _only_ in TWR blocks and _never_ in
>> an unstructured way. The static analysis further revealed that in most
>> cases, only a small amount of native memory was ever allocated, usually less
>> than 32 bytes, and in many cases, 8 bytes or less. This usage pattern lends
>> itself well to pooling.
>>
>> ## Dynamic Analysis
>>
>> A dynamic statistical analysis of actual runs was also made, where various
>> properties of confined arenas were recorded and summarized during a complete
>> tier1 test run. While a tier1 run is not necessarily representative of a
>> typical application workload, it provided some interesting results:
>>
>> The run produced 93 per-process histogram blocks and 788,773,092 closed
>> confined arenas. The result is dominated by arenas with no native allocation
>> at all: 375,934,768 arenas (47.661%) are in the zero-byte bucket. Counting
>> arenas up to 63 bytes covers 99.997% of all arena closures.
>>
>> The largest count bucket is 8-15 bytes per arena with 400,951,293 arenas
>> (50.832% of all arenas). The largest byte bucket is 8-15 bytes per arena
>> with 3,207,623,039 B (3,059.03 MiB) (46.794% of all by...
>
> src/java.base/share/classes/jdk/internal/foreign/ThreadConfinedSegmentPool.java
> line 116:
>
>> 114: }
>> 115:
>> 116: final class CachedArena implements Arena, NoInitAllocator {
>
> My general feeling here is that the implementation is arranged the wrong way.
> E.g. in my mind, we have ArenaImpl, which is the type of the builtin arena we
> return. And, if an ArenaImpl is confined, it can allocate memory more
> cheaply, with the help of some kind of thread-backed allocator.
>
> I feel the right arrangement is to have a SegmentAllocator (not an Arena)
> that returns usable regions of memory from a given thread. Maybe the
> allocator is very low level, it computes the next pointer, and does a
> `MemorySegment.ofAddress(ptr)` for the region. Then the ArenaImpl::allocate
> takes that, and does a reinterpret with the correct arena and size. When the
> confined arena closes, the memory is returned to the underlying pool.
>
> Since this is the builtin confined arena we're talking about, I'm not sure
> about CachedArena -- as that looks like any other 3rd party Arena. I think we
> can achieve tighter integration?
More specifically, I think that in both the confined and shared/auto case, what
you want is a setup like this:
1. when an arena is created, we try to acquire some memory from some pool
2. if we fail, then the arena behaves like before
3. otherwise, the first N allocations of the arena will be served by the pool
4. after that, we either try to acquire another pool, or fallback to default
impl
The only difference between confined and shared is where the pool lives, and
how it is acquired.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3348091270