On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the >> values stored in first (v1) and second (v2) vector arguments. Thus, first >> and second vector serves as a table, whose elements are selected based on >> index value vector. API is applicable to all integral and floating-point >> types. The result of this operation is semantically equivalent to >> expression v1.rearrange(this.toShuffle(), v2). Values held in index vector >> lanes must lie within valid two vector index range [0, 2*VLEN) else an >> IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a >> lowering transformation dismantles new IR into constituent IR supported by >> target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing >> rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt >> Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 >> ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 >> ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 >> ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Incorporating review and documentation suggestions. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2600: > 2598: assert ((vlen & (vlen -1)) == 0); > 2599: int twoVectorLenMask = (vlen << 1) - 1; > 2600: ByteVector wrapped_indexes = this.lanewise(VectorOperators.AND, > twoVectorLenMask); This assert and the following AND forcing power of two vector length seems out of place in Java code. You could move the wrapping within the selectFromTwoVectorOp on similar lines as the PR #20634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1771898190