On Wed, 3 Jun 2026 13:42:55 GMT, Viktor Klang <[email protected]> wrote:
>> ## Summary >> >> This PR proposes to introduce a pooled confined arena as an optimization for >> `Arena.ofConfined()`, where small native allocations can be served from a >> reusable per-thread/per-slot memory pool instead of calling the regular >> native allocator for every short-lived arena. The arena remains confined to >> its owner thread and is still closed normally, but its backing storage can >> be reset and reused when the arena closes. The feature requires no API >> changes. >> >> ### Outline >> >> Platform threads: one lazily allocated pool per Thread, encoded in >> `Thread.confinedMemoryPool`. >> Virtual threads: fixed shared native pool with CAS-protected slots, because >> per-virtual-thread native pools would not scale. >> >> Pooled memory is zeroed out upon _closing_ an Arena to minimize data >> visibility between reuse. This means the data is visible only within a TWR >> block, and never outside it. >> >> By default, a confined arena has access to 64 bytes of pooled data. The >> pool size is configurable via a system property and can be 8, 16, 32, or 64 >> bytes. Pooling can also be turned off completely by setting the pool >> power-of-two size to zero. Nested confined arenas are not supported >> >> ## Static Analysis >> >> An extensive static corpus analysis of third-party libraries and the JDK >> itself has been conducted with respect to `Area.ofConfined()` usage, >> revealing that confined arenas were used _only_ in TWR blocks and _never_ in >> an unstructured way. The static analysis further revealed that in most >> cases, only a small amount of native memory was ever allocated, usually less >> than 32 bytes, and in many cases, 8 bytes or less. This usage pattern lends >> itself well to pooling. >> >> ## Dynamic Analysis >> >> A dynamic statistical analysis of actual runs was also made, where various >> properties of confined arenas were recorded and summarized during a complete >> tier1 test run. While a tier1 run is not necessarily representative of a >> typical application workload, it provided some interesting results: >> >> The run produced 93 per-process histogram blocks and 788,773,092 closed >> confined arenas. The result is dominated by arenas with no native allocation >> at all: 375,934,768 arenas (47.661%) are in the zero-byte bucket. Counting >> arenas up to 63 bytes covers 99.997% of all arena closures. >> >> The largest count bucket is 8-15 bytes per arena with 400,951,293 arenas >> (50.832% of all arenas). The largest byte bucket is 8-15 bytes per arena >> with 3,207,623,039 B (3,059.03 MiB) (46.794% of all by... > > src/java.base/share/classes/java/lang/Thread.java line 401: > >> 399: */ >> 400: @Stable >> 401: private AutoCloseable confinedArenaAllocator; > > I'm on the fence about this—adding a new field into Thread (which also means > for all Virtual Threads that get allocated) for a very specific use-case—what > other options exist, and do VirtualThreads need their own caches or can they > rely on the caches of their carriers? Experiments with carrier-local caches reveal that it is more memory efficient, but very complex and slow. A virtual thread can be remounted on another carrier, for example, which creates problems. For platform threads, I do not believe an extra field is a problem. Creating such a thread is resource-intensive anyhow, and an extra field is well into the noise. For virtual threads, I am a bit uncertain how such a thread is "parked". If one creates a million virtual threads, then we typically would need an extra 4 MiB if no pools are ever created (just field overhead). There are already a large number of fields in Thread/LocalThread. Using a ThreadLocal is much slower. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3349201625
