On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation >> based on hybrid algorithm which initially partially unrolls scalar loop to >> accumulates values from gather indices into a quadword(64bit) slice followed >> by vector permutation to place the slice into appropriate vector lanes, it >> prevents code bloating and generates compact >> JIT sequence. This coupled with savings from expansive array allocation in >> existing java implementation translates into significant performance of >> 1.3-5x gains with included micro. >> >> >>  >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> 3) Some minor adjustments in existing gather instruction pattens for >> double/quad words. >> >> >> Kindly review and share your feedback. >> >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Restricting masked sub-word gather to AVX512 target to align with integral > gather support. @jatin-bhateja I have gone through part of the code. Please see my comments below. src/hotspot/share/opto/vectornode.hpp line 886: > 884: init_class_id(Class_LoadVectorGather); > 885: add_req(indices); > 886: assert(req() == MemNode::ValueIn + 1, "match_edge expects that index > input is in MemNode::ValueIn"); The assert message in "" is questionable. For subword types match_edge expects last input in MemNode::ValueIn+1. I think the assert needs to move in if/else below with correct message. src/hotspot/share/opto/vectornode.hpp line 890: > 888: assert(is_subword_type(vect_type()->element_basic_type()), ""); > 889: add_req(offset); > 890: } It will be good if we retain the other assert on the else path for non subword: assert(indices->bottom_type()->is_vect(), "indices must be in vector"); src/hotspot/share/opto/vectornode.hpp line 999: > 997: add_req(indices); > 998: add_req(mask); > 999: assert(req() == MemNode::ValueIn + 2, "match_edge expects that last > input is in MemNode::ValueIn+1"); The assert message in "" is questionable. For subword types match_edge expects last input in MemNode::ValueIn+2. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3818: > 3816: Class<? extends ByteVector> vectorType = vsp.vectorType(); > 3817: > 3818: int loopIncr = 0; loopIncr is not being used in the method below, could be removed. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3830: > 3828: // Check indices are within array bounds. > 3829: // FIXME: Check index under mask controlling. > 3830: for (int i = 0; i < vsp.length(); i += lsp.length()) { The check (i < vsp.length() ) could instead be (i + lsp.length() <= vsp.length()). src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 3657: > 3655: } > 3656: #else[byteOrShort] > 3657: Extra blank line on else path could be removed. This will also reduce the number of files changed. ------------- PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1711404057 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380918410 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380912896 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380917754 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380901120 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380900894 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1380886441