On Wed, 3 Jun 2026 08:50:01 GMT, Per Minborg <[email protected]> wrote:

> ## Summary
> 
> This PR proposes to introduce a pooled confined arena as an optimization for 
> `Arena.ofConfined()`, where small native allocations can be served from a 
> reusable per-thread/per-slot memory pool instead of calling the regular 
> native allocator for every short-lived arena. The arena remains confined to 
> its owner thread and is still closed normally, but its backing storage can be 
> reset and reused when the arena closes. The feature requires no API changes.
> 
> ### Outline
> 
> Platform threads: one lazily allocated pool per Thread, encoded in 
> `Thread.confinedMemoryPool`.
> Virtual threads: fixed shared native pool with CAS-protected slots, because 
> per-virtual-thread native pools would not scale.
> 
> Pooled memory is zeroed out upon _closing_ an Arena to minimize data 
> visibility between reuse. This means the data is visible only within a TWR 
> block, and never outside it.
> 
> By default, a confined arena has access to 64 bytes of pooled data.  The pool 
> size is configurable via a system property and can be 8, 16, 32, or 64 bytes. 
> Pooling can also be turned off completely by setting the pool power-of-two 
> size to zero. Nested confined arenas are not supported
> 
> ## Static Analysis
> 
> An extensive static corpus analysis of third-party libraries and the JDK 
> itself has been conducted with respect to `Area.ofConfined()` usage, 
> revealing that confined arenas were used _only_ in TWR blocks and _never_ in 
> an unstructured way. The static analysis further revealed that in most cases, 
> only a small amount of native memory was ever allocated, usually less than 32 
> bytes, and in many cases, 8 bytes or less. This usage pattern lends itself 
> well to pooling. 
> 
> ## Dynamic Analysis
> 
> A dynamic statistical analysis of actual runs was also made, where various 
> properties of confined arenas were recorded and summarized during a complete 
> tier1 test run. While a tier1 run is not necessarily representative of a 
> typical application workload, it provided some interesting results:
> 
> The run produced 93 per-process histogram blocks and 788,773,092 closed 
> confined arenas. The result is dominated by arenas with no native allocation 
> at all: 375,934,768 arenas (47.661%) are in the zero-byte bucket. Counting 
> arenas up to 63 bytes covers 99.997% of all arena closures.
> 
> The largest count bucket is 8-15 bytes per arena with 400,951,293 arenas 
> (50.832% of all arenas). The largest byte bucket is 8-15 bytes per arena with 
> 3,207,623,039 B (3,059.03 MiB) (46.794% of all bytes). Buckets below 64 KiB 
> preserve very close t...

> One possible concern here is with clients that expect `Arena::allocate` to 
> result in a call to `malloc`. Some of these clients might expect to be able 
> to override the system allocator -- e.g with jemalloc, to maybe take 
> advantage of additional features such as use after free protection.
> 
> We have seen evidence of that here: #28235

Yes. There will be observable behavior changes in how segments are handled. But 
in no way, we guarantee that there will be a malloc/free invocations when using 
arenas.

Here is a benchmark run on an M1 Mac for 
https://github.com/openjdk/jdk/pull/31365/commits/3523398c45825a022a131bb18a3248085e8cb078


Benchmark                                   (size)  Mode  Cnt   Score   Error  
Units
AllocTest.OfVirtual.alloc_calloc_arena           5  avgt   30  10.426 ± 0.127  
ns/op
AllocTest.OfVirtual.alloc_calloc_arena          20  avgt   30  12.687 ± 0.261  
ns/op
AllocTest.OfVirtual.alloc_calloc_arena         100  avgt   30  11.801 ± 0.301  
ns/op
AllocTest.OfVirtual.alloc_calloc_arena         500  avgt   30  19.278 ± 0.331  
ns/op
AllocTest.OfVirtual.alloc_calloc_arena        2000  avgt   30  29.699 ± 1.321  
ns/op
AllocTest.OfVirtual.alloc_calloc_arena        8000  avgt   30  91.423 ± 3.575  
ns/op
AllocTest.OfVirtual.alloc_confined               5  avgt   30   2.196 ± 0.018  
ns/op -> VT CAS Pooling
AllocTest.OfVirtual.alloc_confined              20  avgt   30   2.511 ± 0.028  
ns/op -> VT CAS Pooling
AllocTest.OfVirtual.alloc_confined             100  avgt   30  18.759 ± 0.200  
ns/op
AllocTest.OfVirtual.alloc_confined             500  avgt   30  26.309 ± 0.545  
ns/op
AllocTest.OfVirtual.alloc_confined            2000  avgt   30  31.244 ± 0.357  
ns/op
AllocTest.OfVirtual.alloc_confined            8000  avgt   30  79.099 ± 0.411  
ns/op
AllocTest.OfVirtual.alloc_confined_no_pool       5  avgt   30  15.715 ± 0.194  
ns/op
AllocTest.OfVirtual.alloc_confined_no_pool      20  avgt   30  18.217 ± 0.297  
ns/op
AllocTest.OfVirtual.alloc_confined_no_pool     100  avgt   30  18.559 ± 0.154  
ns/op
AllocTest.OfVirtual.alloc_confined_no_pool     500  avgt   30  26.103 ± 0.212  
ns/op
AllocTest.OfVirtual.alloc_confined_no_pool    2000  avgt   30  31.070 ± 0.244  
ns/op
AllocTest.OfVirtual.alloc_confined_no_pool    8000  avgt   30  79.270 ± 0.405  
ns/op
AllocTest.OfVirtual.alloc_unsafe_arena           5  avgt   30  16.396 ± 0.453  
ns/op
AllocTest.OfVirtual.alloc_unsafe_arena          20  avgt   30  18.562 ± 0.389  
ns/op
AllocTest.OfVirtual.alloc_unsafe_arena         100  avgt   30  17.239 ± 0.298  
ns/op
AllocTest.OfVirtual.alloc_unsafe_arena         500  avgt   30  24.699 ± 0.397  
ns/op
AllocTest.OfVirtual.alloc_unsafe_arena        2000  avgt   30  30.607 ± 0.156  
ns/op
AllocTest.OfVirtual.alloc_unsafe_arena        8000  avgt   30  87.881 ± 4.379  
ns/op
AllocTest.alloc_calloc_arena                     5  avgt   30  10.643 ± 0.080  
ns/op
AllocTest.alloc_calloc_arena                    20  avgt   30  13.246 ± 0.277  
ns/op
AllocTest.alloc_calloc_arena                   100  avgt   30  11.642 ± 0.102  
ns/op
AllocTest.alloc_calloc_arena                   500  avgt   30  18.777 ± 0.105  
ns/op
AllocTest.alloc_calloc_arena                  2000  avgt   30  31.421 ± 1.309  
ns/op
AllocTest.alloc_calloc_arena                  8000  avgt   30  90.453 ± 2.765  
ns/op
AllocTest.alloc_confined                         5  avgt   30   1.272 ± 0.011  
ns/op -> PT Thread Local Pool
AllocTest.alloc_confined                        20  avgt   30   1.347 ± 0.012  
ns/op -> PT Thread Local Pool
AllocTest.alloc_confined                       100  avgt   30  18.568 ± 0.231  
ns/op
AllocTest.alloc_confined                       500  avgt   30  25.792 ± 0.369  
ns/op
AllocTest.alloc_confined                      2000  avgt   30  31.206 ± 0.392  
ns/op
AllocTest.alloc_confined                      8000  avgt   30  78.468 ± 0.762  
ns/op
AllocTest.alloc_confined_no_pool                 5  avgt   30  15.627 ± 0.174  
ns/op
AllocTest.alloc_confined_no_pool                20  avgt   30  18.088 ± 0.244  
ns/op
AllocTest.alloc_confined_no_pool               100  avgt   30  18.596 ± 0.213  
ns/op
AllocTest.alloc_confined_no_pool               500  avgt   30  26.607 ± 0.676  
ns/op
AllocTest.alloc_confined_no_pool              2000  avgt   30  30.615 ± 0.224  
ns/op
AllocTest.alloc_confined_no_pool              8000  avgt   30  78.748 ± 0.706  
ns/op
AllocTest.alloc_unsafe_arena                     5  avgt   30  16.284 ± 0.279  
ns/op
AllocTest.alloc_unsafe_arena                    20  avgt   30  18.639 ± 0.355  
ns/op
AllocTest.alloc_unsafe_arena                   100  avgt   30  17.305 ± 0.266  
ns/op
AllocTest.alloc_unsafe_arena                   500  avgt   30  24.531 ± 0.360  
ns/op
AllocTest.alloc_unsafe_arena                  2000  avgt   30  30.645 ± 0.183  
ns/op
AllocTest.alloc_unsafe_arena                  8000  avgt   30  88.347 ± 2.594  
ns/op

src/java.base/share/classes/java/lang/Thread.java line 1570:

> 1568:                 confinedArenaAllocator.close();
> 1569:             } catch (Exception e) {
> 1570:                 e.printStackTrace();

I think we should remove this

-------------

PR Comment: https://git.openjdk.org/jdk/pull/31365#issuecomment-4612616127
PR Comment: https://git.openjdk.org/jdk/pull/31365#issuecomment-4728060149
PR Review Comment: https://git.openjdk.org/jdk/pull/31365#discussion_r3347373482

Reply via email to