On Thu, 17 Apr 2025 18:03:47 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:

>> Migrate Vector API math library (SVML and SLEEF) linkage from native code 
>> (in JVM) to Java FFM API.
>> 
>> Since FFM API doesn't support vector calling conventions yet, migration 
>> affects only symbol lookup for now. But it still enables significant 
>> simplifications on JVM side.
>> 
>> The patch consists of the following parts:
>>   * on-demand symbol lookup in Java code replaces eager lookup from native 
>> code during JVM startup;
>>   * 2 new VM intrinsics for vector calls (support unary and binary shapes) 
>> (code separated from unary/binary vector operations);
>>   * new internal interface to query supported CPU ISA extensions 
>> (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching.
>> 
>> `java.lang.foreign` API is used to perform symbol lookup in vector math 
>> library, then the address is cached and fed into corresponding JVM 
>> intrinsic, so C2 can turn it into a direct vector call in generated code.
>> 
>> Once `java.lang.foreign` supports vectors & vector calling conventions, VM 
>> intrinsics can go away. 
>> 
>> Performance is on par with original implementation (tested with 
>> microbenchmarks on linux-x64 and macosx-aarch64).
>> 
>> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and 
>> macosx-aarch64)
>> 
>> Thanks!
>
> Vladimir Ivanov has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 24 additional 
> commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin/master' into vector.math.01.java
>  - RVV and SVE adjustments
>  - fix broken merge
>  - Merge branch 'master' into vector.math.01.java
>  - Fix debugName handling
>  - Merge branch 'master' into vector.math.01.java
>  - RVV and SVE adjustments
>  - Merge branch 'master' into vector.math.01.java
>  - Fix windows-aarch64 build failure
>  - features_string -> cpu_info_string
>  - ... and 14 more: https://git.openjdk.org/jdk/compare/7b477dbf...88eacc48

Very interesting! Looks mostly good to me. Left a few inline notes.

src/hotspot/share/prims/vectorSupport.cpp line 622:

> 620: 
> 621:   ThreadToNativeFromVM ttn(thread);
> 622:   return env->NewStringUTF(features_string);

Isn't there a way to do this without the extra transition?

How about:


  oop result = java_lang_String::create_oop_from_str((char*) bytes, CHECK_NULL);
  return (jstring) JNIHandles::make_local(THREAD, result);

src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 379:

> 377:     V libraryBinaryOp(long addr, Class<? extends V> vClass, Class<E> 
> eClass, int length, String debugName,
> 378:                       V v1, V v2,
> 379:                       BinaryOperation<V,?> defaultImpl) {

I notice that the bound of `V` differs between `libraryUnaryOp`, which uses 
`Vectory<E>` and this method, which uses `VectorPayload`. Not sure if this is 
intentional?

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
 line 272:

> 270:                 MemorySegment addr = LOOKUP.findOrThrow(symbol);
> 271:                 debug("%s %s => 0x%016x\n", op, symbol, addr.address());
> 272:                 T impl = implSupplier.apply(opc); // TODO: should call 
> the very same native implementation eventually (once FFM API supports vectors)

FWIW, one current barrier I see to implementing the vector calling convention 
in the linker, is that the FFM linker (currently) transmits register values to 
the downcall stub use Java primitive types. So, in order to support vector 
calling conventions, we would need to add some kind of 'primitive' that can 
hold the entire vector value, and preferably gets passed in the right register.

However, I think in the case of these math libraries in particular, speed of 
the fallback implementation is not that much of an issue, since there is also 
an intrinsic. So alternatively, we could split a vector value up into smaller 
integral types (`int`, `long`) -> pass them to the downcall stub in that form 
-> and then reconstruct the full vector value in its target register. (I used 
the same trick when I was experimenting with FP80 support, which also requires 
splitting up the 80 bit value up into 2 `long`s).

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java
 line 305:

> 303:     @ForceInline
> 304:     /*package-private*/ static
> 305:     <E, V extends Vector<E>>

Here you're using `Vector` instead of `VectorPayload` for the binary op, so 
there seems to be a discrepancy with `VectorSupport`.

-------------

Marked as reviewed by jvernee (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2778903954
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050830705
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050818654
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050875854
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050869087

Reply via email to