On Mon, 15 Aug 2022 01:10:54 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

>> Vector API binary op "`FIRST_NONZERO`" represents the vector operation of 
>> "`a != 0 ? a : b`", which can be implemented with existing APIs like 
>> "`compare + blend`". The current implementation is more complex especially 
>> for the floating point type vectors. The main idea is:
>> 
>> 
>> 1) mask = a.compare(0, ne);
>> 2) b = b.blend(0, mask);
>> 3) result = a | b;
>> 
>> 
>> And for the floating point types, it needs the vector reinterpretation 
>> between the floating point type and the relative integral type, since the 
>> final "`OR`" operation is only valid for bitwise integral types.
>> 
>> A simpler implementation is:
>> 
>> 
>> 1) mask = a.compare(0, eq);
>> 2) result = a.blend(b, mask);
>> 
>> 
>> This could save the final "`OR`" operation and the related reinterpretation 
>> between FP and integral types.
>> 
>> Here are the performance data of the "`FIRST_NONZERO`" benchmarks (please 
>> see the benchmark details for byte vector from [1]) on ARM NEON system:
>> 
>> Benchmark                          (size) Mode  Cnt  Before    After    Units
>> ByteMaxVector.FIRST_NONZERO         1024  thrpt  15 12107.422 18385.157 
>> ops/ms
>> ByteMaxVector.FIRST_NONZEROMasked   1024  thrpt  15  9765.282 14739.775 
>> ops/ms
>> DoubleMaxVector.FIRST_NONZERO       1024  thrpt  15  1798.545  2331.214 
>> ops/ms
>> DoubleMaxVector.FIRST_NONZEROMasked 1024  thrpt  15  1211.838  1810.644 
>> ops/ms
>> FloatMaxVector.FIRST_NONZERO        1024  thrpt  15  3491.924  4377.167 
>> ops/ms
>> FloatMaxVector.FIRST_NONZEROMasked  1024  thrpt  15  2307.085  3606.576 
>> ops/ms
>> IntMaxVector.FIRST_NONZERO          1024  thrpt  15  3602.727  5610.258 
>> ops/ms
>> IntMaxVector.FIRST_NONZEROMasked    1024  thrpt  15  2726.843  4210.741 
>> ops/ms
>> LongMaxVector.FIRST_NONZERO         1024  thrpt  15  1819.886  2974.655 
>> ops/ms
>> LongMaxVector.FIRST_NONZEROMasked   1024  thrpt  15  1337.737  2315.094 
>> ops/ms
>> ShortMaxVector.FIRST_NONZERO        1024  thrpt  15  6603.642  9586.320 
>> ops/ms
>> ShortMaxVector.FIRST_NONZEROMasked  1024  thrpt  15  5222.006  7991.443 
>> ops/ms
>> 
>> We can also observe the similar improvement on x86 system.
>> 
>> [1] 
>> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L266
>
> ping again. Could anyone please take a look at this simple patch? Thanks so 
> much for your time!

@XiaohongGong looking... (just back from vacation).

-------------

PR: https://git.openjdk.org/jdk/pull/9683

Reply via email to