Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

Jatin Bhateja Sun, 05 Nov 2023 05:07:08 -0800

On Fri, 3 Nov 2023 23:07:44 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> 
wrote:


>> Jatin Bhateja has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Restricting masked sub-word gather to AVX512 target to align with integral 
>> gather support.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606:
> 
>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt, 
>> XMMRegister dst, Register base, Register idx_base,
>> 1605:                                           Register offset, Register 
>> rtmp, int vlen_enc) {
>> 1606:   vpxor(dst, dst, dst, vlen_enc);
> 
> Why the vpxor here and in vpgather8b. These are not masked gathers.

This is to clear the vector gathering 8 subwords with each iteration.

> src/hotspot/cpu/x86/x86.ad line 1921:
> 
>> 1919:     case Op_LoadVectorGatherMasked:
>> 1920:       if (!is_subword_type(bt) && size_in_bits < 512 && 
>> !VM_Version::supports_avx512vl()) {
>> 1921:         return false;
> 
> Also need to return false when size_in_bits == 64 for non subword types.

match_rule_supported_vector  called in the beginning will enforce these checks.

> src/hotspot/cpu/x86/x86.ad line 1939:
> 
>> 1937:       // fallthrough
>> 1938:     case Op_LoadVectorGather:
>> 1939:       if (!is_subword_type(bt) && size_in_bits == 64 ) {
> 
> Since this is a fallthrough case, the change also applies to 
> StoreVectorScatter*.  The !is_subword_type() is needed only for 
> LoadVectorGather.

We do not intrinsify sub-word scatter operations currently.

> src/hotspot/cpu/x86/x86.ad line 4074:
> 
>> 4072:     BasicType elem_bt = Matcher::vector_element_basic_type(this);
>> 4073:     assert(!is_subword_type(elem_bt), "sanity"); // T_INT, T_LONG, 
>> T_FLOAT, T_DOUBLE
>> 4074:     __ vpcmpeqd($mask$$XMMRegister, $mask$$XMMRegister, 
>> $mask$$XMMRegister, vlen_enc);
> 
> vpcmpeqd is expensive instruction as compared to movdqu and also unrelated to 
> subword type  support.

compare instruction here does not access a memory operand, hence its cheaper 
compared to memory loads.

> src/hotspot/share/opto/vectorIntrinsics.cpp line 1579:
> 
>> 1577:   Node* index    = argument(11);
>> 1578:   Node* indexMap = argument(12);
>> 1579:   Node* indexM   = argument(13);
> 
> Could be renamed as follows for better understanding: index -> offset, indexM 
> -> indexOffset. Also this should be moved under else part of if (is_scatter).

Naming scheme is based on the actual names used in intrinsic entry point for 
these arguments.

> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java 
> line 3830:
> 
>> 3828:         // Check indices are within array bounds.
>> 3829:         // FIXME: Check index under mask controlling.
>> 3830:         for (int i = 0; i < vsp.length(); i += lsp.length()) {
> 
> The check (i  < vsp.length() )  could instead be (i + lsp.length() <= 
> vsp.length()).

Bit sizes of vector shape is multiple of 128 and we do not support NPOT sizes.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571225
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571157
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571152
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571178
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571683
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571546

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

Reply via email to