On Wed, 3 Jun 2026 08:50:01 GMT, Per Minborg <[email protected]> wrote:
> ## Summary > > This PR proposes to introduce a pooled confined arena as an optimization for > `Arena.ofConfined()`, where small native allocations can be served from a > reusable per-thread/per-slot memory pool instead of calling the regular > native allocator for every short-lived arena. The arena remains confined to > its owner thread and is still closed normally, but its backing storage can be > reset and reused when the arena closes. The feature requires no API changes. > > ### Outline > > Platform threads: one lazily allocated pool per Thread, encoded in > `Thread.confinedMemoryPool`. > Virtual threads: fixed shared native pool with CAS-protected slots, because > per-virtual-thread native pools would not scale. > > Pooled memory is zeroed out upon _closing_ an Arena to minimize data > visibility between reuse. This means the data is visible only within a TWR > block, and never outside it. > > By default, a confined arena has access to 64 bytes of pooled data. The pool > size is configurable via a system property and can be 8, 16, 32, or 64 bytes. > Pooling can also be turned off completely by setting the pool power-of-two > size to zero. Nested confined arenas are not supported > > ## Static Analysis > > An extensive static corpus analysis of third-party libraries and the JDK > itself has been conducted with respect to `Area.ofConfined()` usage, > revealing that confined arenas were used _only_ in TWR blocks and _never_ in > an unstructured way. The static analysis further revealed that in most cases, > only a small amount of native memory was ever allocated, usually less than 32 > bytes, and in many cases, 8 bytes or less. This usage pattern lends itself > well to pooling. > > ## Dynamic Analysis > > A dynamic statistical analysis of actual runs was also made, where various > properties of confined arenas were recorded and summarized during a complete > tier1 test run. While a tier1 run is not necessarily representative of a > typical application workload, it provided some interesting results: > > The run produced 93 per-process histogram blocks and 788,773,092 closed > confined arenas. The result is dominated by arenas with no native allocation > at all: 375,934,768 arenas (47.661%) are in the zero-byte bucket. Counting > arenas up to 63 bytes covers 99.997% of all arena closures. > > The largest count bucket is 8-15 bytes per arena with 400,951,293 arenas > (50.832% of all arenas). The largest byte bucket is 8-15 bytes per arena with > 3,207,623,039 B (3,059.03 MiB) (46.794% of all bytes). Buckets below 64 KiB > preserve very close t... > One possible concern here is with clients that expect `Arena::allocate` to > result in a call to `malloc`. Some of these clients might expect to be able > to override the system allocator -- e.g with jemalloc, to maybe take > advantage of additional features such as use after free protection. > > We have seen evidence of that here: #28235 Yes. There will be observable behavior changes in how segments are handled. But in no way, we guarantee that there will be a malloc/free invocations when using arenas. Here is a benchmark run on an M1 Mac for https://github.com/openjdk/jdk/pull/31365/commits/3523398c45825a022a131bb18a3248085e8cb078 Benchmark (size) Mode Cnt Score Error Units AllocTest.OfVirtual.alloc_calloc_arena 5 avgt 30 10.426 ± 0.127 ns/op AllocTest.OfVirtual.alloc_calloc_arena 20 avgt 30 12.687 ± 0.261 ns/op AllocTest.OfVirtual.alloc_calloc_arena 100 avgt 30 11.801 ± 0.301 ns/op AllocTest.OfVirtual.alloc_calloc_arena 500 avgt 30 19.278 ± 0.331 ns/op AllocTest.OfVirtual.alloc_calloc_arena 2000 avgt 30 29.699 ± 1.321 ns/op AllocTest.OfVirtual.alloc_calloc_arena 8000 avgt 30 91.423 ± 3.575 ns/op AllocTest.OfVirtual.alloc_confined 5 avgt 30 2.196 ± 0.018 ns/op -> VT CAS Pooling AllocTest.OfVirtual.alloc_confined 20 avgt 30 2.511 ± 0.028 ns/op -> VT CAS Pooling AllocTest.OfVirtual.alloc_confined 100 avgt 30 18.759 ± 0.200 ns/op AllocTest.OfVirtual.alloc_confined 500 avgt 30 26.309 ± 0.545 ns/op AllocTest.OfVirtual.alloc_confined 2000 avgt 30 31.244 ± 0.357 ns/op AllocTest.OfVirtual.alloc_confined 8000 avgt 30 79.099 ± 0.411 ns/op AllocTest.OfVirtual.alloc_confined_no_pool 5 avgt 30 15.715 ± 0.194 ns/op AllocTest.OfVirtual.alloc_confined_no_pool 20 avgt 30 18.217 ± 0.297 ns/op AllocTest.OfVirtual.alloc_confined_no_pool 100 avgt 30 18.559 ± 0.154 ns/op AllocTest.OfVirtual.alloc_confined_no_pool 500 avgt 30 26.103 ± 0.212 ns/op AllocTest.OfVirtual.alloc_confined_no_pool 2000 avgt 30 31.070 ± 0.244 ns/op AllocTest.OfVirtual.alloc_confined_no_pool 8000 avgt 30 79.270 ± 0.405 ns/op AllocTest.OfVirtual.alloc_unsafe_arena 5 avgt 30 16.396 ± 0.453 ns/op AllocTest.OfVirtual.alloc_unsafe_arena 20 avgt 30 18.562 ± 0.389 ns/op AllocTest.OfVirtual.alloc_unsafe_arena 100 avgt 30 17.239 ± 0.298 ns/op AllocTest.OfVirtual.alloc_unsafe_arena 500 avgt 30 24.699 ± 0.397 ns/op AllocTest.OfVirtual.alloc_unsafe_arena 2000 avgt 30 30.607 ± 0.156 ns/op AllocTest.OfVirtual.alloc_unsafe_arena 8000 avgt 30 87.881 ± 4.379 ns/op AllocTest.alloc_calloc_arena 5 avgt 30 10.643 ± 0.080 ns/op AllocTest.alloc_calloc_arena 20 avgt 30 13.246 ± 0.277 ns/op AllocTest.alloc_calloc_arena 100 avgt 30 11.642 ± 0.102 ns/op AllocTest.alloc_calloc_arena 500 avgt 30 18.777 ± 0.105 ns/op AllocTest.alloc_calloc_arena 2000 avgt 30 31.421 ± 1.309 ns/op AllocTest.alloc_calloc_arena 8000 avgt 30 90.453 ± 2.765 ns/op AllocTest.alloc_confined 5 avgt 30 1.272 ± 0.011 ns/op -> PT Thread Local Pool AllocTest.alloc_confined 20 avgt 30 1.347 ± 0.012 ns/op -> PT Thread Local Pool AllocTest.alloc_confined 100 avgt 30 18.568 ± 0.231 ns/op AllocTest.alloc_confined 500 avgt 30 25.792 ± 0.369 ns/op AllocTest.alloc_confined 2000 avgt 30 31.206 ± 0.392 ns/op AllocTest.alloc_confined 8000 avgt 30 78.468 ± 0.762 ns/op AllocTest.alloc_confined_no_pool 5 avgt 30 15.627 ± 0.174 ns/op AllocTest.alloc_confined_no_pool 20 avgt 30 18.088 ± 0.244 ns/op AllocTest.alloc_confined_no_pool 100 avgt 30 18.596 ± 0.213 ns/op AllocTest.alloc_confined_no_pool 500 avgt 30 26.607 ± 0.676 ns/op AllocTest.alloc_confined_no_pool 2000 avgt 30 30.615 ± 0.224 ns/op AllocTest.alloc_confined_no_pool 8000 avgt 30 78.748 ± 0.706 ns/op AllocTest.alloc_unsafe_arena 5 avgt 30 16.284 ± 0.279 ns/op AllocTest.alloc_unsafe_arena 20 avgt 30 18.639 ± 0.355 ns/op AllocTest.alloc_unsafe_arena 100 avgt 30 17.305 ± 0.266 ns/op AllocTest.alloc_unsafe_arena 500 avgt 30 24.531 ± 0.360 ns/op AllocTest.alloc_unsafe_arena 2000 avgt 30 30.645 ± 0.183 ns/op AllocTest.alloc_unsafe_arena 8000 avgt 30 88.347 ± 2.594 ns/op src/java.base/share/classes/java/lang/Thread.java line 1570: > 1568: confinedArenaAllocator.close(); > 1569: } catch (Exception e) { > 1570: e.printStackTrace(); I think we should remove this ------------- PR Comment: https://git.openjdk.org/jdk/pull/31365#issuecomment-4612616127 PR Comment: https://git.openjdk.org/jdk/pull/31365#issuecomment-4728060149 PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3347373482
