On Wed, 22 Jan 2025 10:12:13 GMT, Matthias Ernst <[email protected]> wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implementation, this buffer
>> is malloced and freed on every FFM invocation, a non-negligible overhead.
>>
>> Sample stack trace:
>>
>> java.lang.Thread.State: RUNNABLE
>> at jdk.internal.misc.Unsafe.allocateMemory0(java.base@25-ea/Native
>> Method)
>> ...
>> at
>> jdk.internal.foreign.abi.SharedUtils.newBoundedArena(java.base@25-ea/SharedUtils.java:386)
>> at
>> jdk.internal.foreign.abi.DowncallStub/0x000001f001084c00.invoke(java.base@25-ea/Unknown
>> Source)
>> ...
>> at
>> java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@25-ea/Invokers$Holder)
>>
>>
>> To alleviate this, this PR remembers and reuses up to two small intermediate
>> buffers per carrier-thread in subsequent calls.
>>
>> Performance (MBA M3):
>>
>>
>> Before:
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadByValue.byPtr avgt 10 3.333 ? 0.152 ns/op
>> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>>
>> After:
>> Benchmark Mode Cnt Score Error Units
>> CallOverheadByValue.byPtr avgt 10 3.291 ? 0.031 ns/op
>> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>>
>>
>> `-prof gc` also shows that the new call path is fully scalar-replaced vs 160
>> byte/call before.
>
> Matthias Ernst has updated the pull request incrementally with one additional
> commit since the last revision:
>
> --unnecessary annotations
src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line 38:
> 36: @SuppressWarnings("restricted")
> 37: public MemorySegment allocate(long byteSize, long byteAlignment) {
> 38: MemorySegment slice = backingSegment.asSlice(offset,
> byteSize, byteAlignment);
You have re-implemented a slicing allocator
(`SegmentAllocator::slicingAllocator`). I think that's probably ok, but I'll
point out that slicing allocator will try to deal with alignment, and will also
throw exceptions when called with non-sensical arguments.
A more robust way to do this would be to:
1. have `reserve` pass the reserved size to `Frame`
2. `Frame` will slice the segment according to offset/size
3. then create a slicing allocator based on that segment
4. use the slicing allocator to implement `allocate`
In our tests, something like this did not add any extra overhead (the
allocation of the slicing allocator is escape-analyzed away)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23142#discussion_r1925099911