On Fri, 20 May 2022 09:51:24 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> Patch adds the planned support for new vector operations and APIs targeted >> for [JEP 426: Vector API (Fourth >> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173) >> >> Following is the brief summary of changes:- >> >> 1) Extends the scope of existing lanewise API for following new vector >> operations. >> - VectorOperations.BIT_COUNT: counts the number of one-bits >> - VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero >> bits >> - VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing >> zero bits >> - VectorOperations.REVERSE: reversing the order of bits >> - VectorOperations.REVERSE_BYTES: reversing the order of bytes >> - compress and expand bits: Semantics are based on Hacker's Delight >> section 7-4 Compress, or Generalized Extract. >> >> 2) Adds following new APIs to perform cross lane vector compress and >> expansion operations under the influence of a mask. >> - Vector.compress >> - Vector.expand >> - VectorMask.compress >> >> 3) Adds predicated and non-predicated versions of following new APIs to load >> and store the contents of vector from foreign MemorySegments. >> - Vector.fromMemorySegment >> - Vector.intoMemorySegment >> >> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support >> for each newly added operation. >> >> >> Patch has been regressed over AARCH64 and X86 targets different AVX levels. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8284960: Integrating incremental patches. src/hotspot/cpu/x86/assembler_x86.cpp line 7934: > 7932: > 7933: void Assembler::evplzcntd(XMMRegister dst, KRegister mask, XMMRegister > src, bool merge, int vector_len) { > 7934: assert(VM_Version::supports_avx512cd() && (vector_len == AVX_512bit > || VM_Version::supports_avx512vl()), ""); Please, split assert as in other instructions - it will help to understand failure better. src/hotspot/cpu/x86/assembler_x86.cpp line 7946: > 7944: > 7945: void Assembler::evplzcntq(XMMRegister dst, KRegister mask, XMMRegister > src, bool merge, int vector_len) { > 7946: assert(VM_Version::supports_avx512cd() && (vector_len == AVX_512bit > || VM_Version::supports_avx512vl()), ""); Split assert. src/hotspot/cpu/x86/assembler_x86.cpp line 8173: > 8171: > 8172: void Assembler::vinsertf32x4(XMMRegister dst, XMMRegister nds, > XMMRegister src, uint8_t imm8) { > 8173: assert(VM_Version::supports_evex(), ""); Hmm, did we never trigger this wrong assert because the use was guarded by correct check? src/hotspot/cpu/x86/assembler_x86.cpp line 11720: > 11718: > 11719: void Assembler::evpcompressb(XMMRegister dst, KRegister mask, > XMMRegister src, bool merge, int vector_len) { > 11720: assert(VM_Version::supports_avx512_vbmi2() && (vector_len == > AVX_512bit || VM_Version::supports_avx512vl()), ""); Split assert in this and following new instructions. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4455: > 4453: break; > 4454: default: > 4455: fatal("Unsupported type"); Print wrong type: `fatal("Unsupported type : %s", type2name(type));` Below too. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4561: > 4559: case 4 : evpbroadcastd(dst, rtmp, vec_enc); break; > 4560: case 8 : evpbroadcastq(dst, rtmp, vec_enc); break; > 4561: default : ShouldNotReachHere(); break; `ShouldNotReachHere` does not give any information in case of failure. Use `fatal()` which prints wrong `lane_size`. Same below. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4666: > 4664: break; > 4665: default: > 4666: ShouldNotReachHere(); Use `fatal()`. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4693: > 4691: break; > 4692: default: > 4693: ShouldNotReachHere(); Use `fatal()`. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4732: > 4730: vector_reverse_byte(bt, dst, xtmp2, rtmp, vec_enc); > 4731: > 4732: } else if(!VM_Version::supports_avx512vlbw() && vec_enc == > Assembler::AVX_512bit) { No need to check `!VM_Version::supports_avx512vlbw()`. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4759: > 4757: vpandn(xtmp2, xtmp2, xtmp1, vec_enc); > 4758: vpsrlq(xtmp2, xtmp2, 1, vec_enc); > 4759: vporq(xtmp1, dst, xtmp2, vec_enc); All 3 code snippets are the same except constants. Also similar code in `vector_reverse_byte64` for `short` type. Consider factoring out it into separate method. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4819: > 4817: break; > 4818: default: > 4819: fatal("Unsupported type"); Print wring type. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4846: > 4844: break; > 4845: default: > 4846: fatal("Unsupported type"); Print wring type. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4892: > 4890: break; > 4891: default: > 4892: ShouldNotReachHere(); Use `fatal` and print type. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5018: > 5016: break; > 5017: default: > 5018: ShouldNotReachHere(); Use fatal and print type. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5037: > 5035: break; > 5036: default: > 5037: ShouldNotReachHere(); Use fatal and print type. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5056: > 5054: break; > 5055: default: > 5056: ShouldNotReachHere(); Use fatal and print type. src/hotspot/cpu/x86/x86.ad line 8879: > 8877: // special handling should be removed. > 8878: if (bt == T_LONG && rbt == T_INT) { > 8879: if (VM_Version::supports_avx512vl()) { Predicate say `!VM_Version::supports_avx512vl()` src/hotspot/share/opto/node.hpp line 1006: > 1004: > 1005: // The node is a CountedLoopEnd with a mask annotation so as to emit > a restore context > 1006: bool has_vector_mask_set() const { return (_flags & > Flag_has_vector_mask_set) != 0; } I don't see use of this flag. src/hotspot/share/opto/vectorIntrinsics.cpp line 86: > 84: if ((mask_use_type & VecMaskUseLoad) != 0) { > 85: if (!Matcher::match_rule_supported_vector(Op_VectorLoadMask, > num_elem, elem_bt) || > 86: !Matcher::match_rule_supported_vector(Op_LoadVector, num_elem, > T_BOOLEAN)) { Add comment explaining new check. In follow ing places too. src/hotspot/share/runtime/vmStructs.cpp line 1779: > 1777: declare_c2_type(CMoveVDNode, VectorNode) > \ > 1778: declare_c2_type(CompressVNode, VectorNode) > \ > 1779: declare_c2_type(ExpandVNode, VectorNode) > \ Not all new nodes listed. ------------- PR: https://git.openjdk.java.net/jdk/pull/8425