On Fri, 25 Apr 2025 09:17:02 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Thanks for telling me this information. Another more important reason to >> check outcnt here is to prevent this optimization when the uses of >> VectorMaskCmp is greater than 1, because this optimization may not be >> worthwhile. For example: >> >> >> public static void testVectorMaskCmp() { >> IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0); >> IntVector av = IntVector.fromArray(I_SPECIES, ia, 0); >> VectorMask<Integer> m1 = av.compare(VectorOperators.NE, bv); // two uses >> VectorMask<Integer> m2 =m1.not(); >> m1.intoArray(m, 0); >> av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0); >> } >> >> >> If we do not check outcnt and still do this optimization, two VectorMaskCmp >> nodes will be generated, and finally two VectorMaskCmp instructions will be >> generated. This is unreasonable because VectorMaskCmp has much higher >> latency than xor instruction on aarch64. > > Thanks, we can add this comment to the code where we are checking outcnt. > What if all the other users are also XorNodes?. At present, you are checking for one XOR user; shouldn't it be all or one scenario? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077378879