On Fri, 7 Apr 2023 18:04:16 GMT, Quan Anh Mai <[email protected]> wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java
>> line 96:
>>
>>> 94: }
>>> 95: Vector<?> shufvec = this.toBitsVector();
>>> 96: VectorMask<?> vecmask = shufvec.compare(VectorOperators.LT, 0);
>>
>> This may impact the intrinsification over AVX1 targets for floating point
>> shuffles. Since bits vector is an integral vector and AVX1 does support 32
>> byte floats but not 32 byte integral vectors.
>
> Yes I think it is a drawback of this approach, however currently we do not
> support shuffling for 256-bit vectors on AVX1 machines either, and AVX1 seems
> to be a special case in this regard. This species of float and double may
> also be less common in the usage of Vector API since it is larger than
> SPECIES_PREFERRED.
Hi @merykitty , Agree with you that SPECIES_PREFERRED is preferred for vector
algorithms intercepting both integral and floating point vectors.
FTR, we see a perf regression with Float256 based micro now on AVX=1 targets,
public static short micro() {
VectorShuffle<Float> iota = FloatVector.SPECIES_256.iotaShuffle(0, 1,
true);
return
iota.cast(ShortVector.SPECIES_128).toVector().reinterpretAsShorts().lane(1);
}
CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1
-XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp .
shufflef
CompileCommand: compileonly shufflef.micro bool compileonly = true
** not supported: arity=1 op=reinterpret/1 vlen1=8 etype1=int ismask=0
** not supported: arity=1 op=cast/1 vlen1=8 etype1=int ismask=0
@ 17 java.lang.Object::getClass (0 bytes)
(intrinsic)
@ 24 java.lang.Object::getClass (0 bytes)
(intrinsic)
@ 45
jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline
(intrinsic)
@ 34 java.lang.Object::getClass (0 bytes)
(intrinsic)
@ 54
jdk.internal.vm.vector.VectorSupport::convert (36 bytes) failed to inline
(intrinsic)
@ 17 java.lang.Object::getClass (0 bytes)
(intrinsic)
@ 24 java.lang.Object::getClass (0 bytes)
(intrinsic)
@ 45
jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 292 java.lang.Object::getClass (0
bytes) (intrinsic)
@ 298 java.lang.Object::getClass (0
bytes) (intrinsic)
@ 322
jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 292 java.lang.Object::getClass (0
bytes) (intrinsic)
@ 298 java.lang.Object::getClass (0
bytes) (intrinsic)
@ 322
jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 16
jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic)
[time] 386ms [res]3392
CPROMPT>export JAVA_HOME=/home/jatinbha/softwares/jdk-20/
CPROMPT>export PATH=$JAVA_HOME/bin:$PATH
CPROMPT>javad --add-modules=jdk.incubator.vector -XX:UseAVX=1
-XX:+PrintIntrinsics -XX:CompileCommand=compileonly,shufflef::micro -cp .
shufflef
CompileCommand: compileonly shufflef.micro bool compileonly = true
WARNING: Using incubator modules: jdk.incubator.vector
@ 3 jdk.internal.misc.Unsafe::loadFence
(5 bytes) (intrinsic)
@ 3
jdk.internal.misc.Unsafe::loadFence (5 bytes) (intrinsic)
@ 17
jdk.internal.vm.vector.VectorSupport::shuffleToVector (33 bytes) (intrinsic)
@ 292 java.lang.Object::getClass (0
bytes) (intrinsic)
@ 298 java.lang.Object::getClass (0
bytes) (intrinsic)
@ 322
jdk.internal.vm.vector.VectorSupport::convert (36 bytes) (intrinsic)
@ 16
jdk.internal.vm.vector.VectorSupport::extract (35 bytes) (intrinsic)
[time] 7ms [res]3392
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1161810585