On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Hi All,
>> 
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
>> AVX512 features.
>> 
>> Following is the summary of changes:-
>> 
>> 1) Intrinsify sub-word gather with high performance backend implementation 
>> based on hybrid algorithm which initially partially unrolls scalar loop to 
>> accumulates values from gather indices into a quadword(64bit) slice followed 
>> by vector permutation to place the slice into appropriate vector lanes, it 
>> prevents code bloating and generates compact
>> JIT sequence. This coupled with savings from expansive array allocation in 
>> existing java implementation translates into significant performance of 
>> 1.3-5x gains with included micro.
>> 
>> 
>> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
>> 
>> 
>> 2) Patch was also compared against modified java fallback implementation by 
>> replacing temporary array allocation with zero initialized vector and a 
>> scalar loops which inserts gathered values into vector. But, vector insert 
>> operation in higher vector lanes is a three step process which first 
>> extracts the upper vector 128 bit lane, updates it with gather subword value 
>> and then inserts the lane back to its original position. This makes inserts 
>> into higher order lanes costly w.r.t to proposed solution. In addition 
>> generated JIT code for modified fallback implementation was very bulky. This 
>> may impact in-lining decisions into caller contexts.
>> 
>> 3) Some minor adjustments in existing gather instruction pattens for 
>> double/quad words.
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Restricting masked sub-word gather to AVX512 target to align with integral 
> gather support.

@jatin-bhateja I have gone through part of the code. Please see my comments 
below.

src/hotspot/share/opto/vectornode.hpp line 886:

> 884:     init_class_id(Class_LoadVectorGather);
> 885:     add_req(indices);
> 886:     assert(req() == MemNode::ValueIn + 1, "match_edge expects that index 
> input is in MemNode::ValueIn");

The assert message in "" is questionable. For subword types match_edge expects 
last input in MemNode::ValueIn+1. I think the assert needs to move in if/else 
below with correct message.

src/hotspot/share/opto/vectornode.hpp line 890:

> 888:       assert(is_subword_type(vect_type()->element_basic_type()), "");
> 889:       add_req(offset);
> 890:     }

It will be good if we retain the other assert on the else path for non subword:
assert(indices->bottom_type()->is_vect(), "indices must be in vector");

src/hotspot/share/opto/vectornode.hpp line 999:

> 997:     add_req(indices);
> 998:     add_req(mask);
> 999:     assert(req() == MemNode::ValueIn + 2, "match_edge expects that last 
> input is in MemNode::ValueIn+1");

The assert message in "" is questionable. For subword types match_edge expects 
last input in MemNode::ValueIn+2.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java 
line 3818:

> 3816:         Class<? extends ByteVector> vectorType = vsp.vectorType();
> 3817: 
> 3818:         int loopIncr = 0;

loopIncr is not being used in the method below, could be removed.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java 
line 3830:

> 3828:         // Check indices are within array bounds.
> 3829:         // FIXME: Check index under mask controlling.
> 3830:         for (int i = 0; i < vsp.length(); i += lsp.length()) {

The check (i  < vsp.length() )  could instead be (i + lsp.length() <= 
vsp.length()).

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
 line 3657:

> 3655:     }
> 3656: #else[byteOrShort]
> 3657: 

Extra blank line on else path could be removed. This will also reduce the 
number of files changed.

-------------

PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1711404057
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380918410
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380912896
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380917754
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380901120
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380900894
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380886441

Reply via email to