On Tue, 10 May 2022 12:48:25 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> Patch adds the planned support for new vector operations and APIs targeted >> for [JEP 426: Vector API (Fourth >> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173) >> >> Following is the brief summary of changes:- >> >> 1) Extends the scope of existing lanewise API for following new vector >> operations. >> - VectorOperations.BIT_COUNT: counts the number of one-bits >> - VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero >> bits >> - VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing >> zero bits >> - VectorOperations.REVERSE: reversing the order of bits >> - VectorOperations.REVERSE_BYTES: reversing the order of bytes >> - compress and expand bits: Semantics are based on Hacker's Delight >> section 7-4 Compress, or Generalized Extract. >> >> 2) Adds following new APIs to perform cross lane vector compress and >> expansion operations under the influence of a mask. >> - Vector.compress >> - Vector.expand >> - VectorMask.compress >> >> 3) Adds predicated and non-predicated versions of following new APIs to load >> and store the contents of vector from foreign MemorySegments. >> - Vector.fromMemorySegment >> - Vector.intoMemorySegment >> >> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support >> for each newly added operation. >> >> >> Patch has been regressed over AARCH64 and X86 targets different AVX levels. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Correcting a typo. > - 8284960: Integrating changes from panama-vector (Add @since 19 tags). > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: AARCH64 backend changes. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - ... and 1 more: > https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082 Overall, looks good. Some minor questions/suggestions follow. src/hotspot/cpu/aarch64/aarch64_neon.ad line 5700: > 5698: as_FloatRegister($dst$$reg)); > 5699: } > 5700: if (bt == T_INT) { I find it hard to reason about the code in its current form. Maybe make the second `if` (`bt == T_INT`) nested and move it under `if (bt == T_SHORT || bt == T_INT)`? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2587: > 2585: > 2586: void MacroAssembler::vmovdqu(XMMRegister dst, AddressLiteral src, > Register scratch_reg, int vector_len) { > 2587: assert(vector_len <= AVX_512bit, "unexpected vector length"); The assert becomes redundant. src/hotspot/cpu/x86/matcher_x86.hpp line 195: > 193: case Op_PopCountVI: > 194: return ((ety == T_INT && > VM_Version::supports_avx512_vpopcntdq()) || > 195: (is_subword_type(ety) && > VM_Version::supports_avx512_bitalg())) ? 0 : 50; Should be easier to read when the condition is split. E.g.: if (is_subword_type(ety)) { return VM_Version::supports_avx512_bitalg())) ? 0 : 50; } else { assert(ety == T_INT, "sanity"); // for documentation purposes return VM_Version::supports_avx512_vpopcntdq() ? 0 : 50; } src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7953: > 7951: StubRoutines::x86::_vector_iota_indices = > generate_iota_indices("iota_indices"); > 7952: > 7953: if (UsePopCountInstruction && VM_Version::supports_avx2() && > !VM_Version::supports_avx512_vpopcntdq()) { Why is the LUT unconditionally generated? `UsePopCountInstruction` still guides the usages. src/hotspot/cpu/x86/vm_version_x86.hpp line 375: > 373: decl(RDTSCP, "rdtscp", 48) /* RDTSCP > instruction */ \ > 374: decl(RDPID, "rdpid", 49) /* RDPID > instruction */ \ > 375: decl(FSRM, "fsrm", 50) /* Fast Short REP > MOV */ \ `test/lib-test/jdk/test/whitebox/CPUInfoTest.java` should be adjusted as well, shouldn't it? src/hotspot/cpu/x86/x86.ad line 2113: > 2111: > 2112: case Op_CountLeadingZerosV: > 2113: if ((bt == T_INT || bt == T_LONG) && > VM_Version::supports_avx512cd()) { Newly introduced `is_non_subword_integral_type(bt)` can be used here instead of `bt == T_INT || bt == T_LONG`. src/hotspot/share/classfile/vmIntrinsics.hpp line 1152: > 1150: > "Ljdk/internal/vm/vector/VectorSupport$ComExpOperation;)" > \ > 1151: > "Ljdk/internal/vm/vector/VectorSupport$VectorPayload;") > \ > 1152: do_name(vector_comexp_op_name, "comExpOp") > \ I don't see much value in trying to shorten the name by abbreviating it. I find it easier to read in an expanded form: ` compressExpandOp`, `vector_compress_expand_op_name`, `_VectorCompressExpand`, etc. src/hotspot/share/opto/c2compiler.cpp line 521: > 519: if (!Matcher::match_rule_supported(Op_SignumF)) return false; > 520: break; > 521: case vmIntrinsics::_VectorComExp: Why `_VectorComExp` intrinsic is special? Other vector intrinsics are handled later and in a different manner. What about `ExpandV` case? src/hotspot/share/opto/compile.cpp line 3416: > 3414: > 3415: case Op_ReverseBytesV: > 3416: case Op_ReverseV: { Can you elaborate, please, why it is performed so late in the optimization phase (at the very end during graph reshaping) and not during GVN? ------------- PR: https://git.openjdk.java.net/jdk/pull/8425