On Fri, 25 Apr 2025 09:17:02 GMT, Jatin Bhateja <[email protected]> wrote:
>> Thanks for telling me this information. Another more important reason to
>> check outcnt here is to prevent this optimization when the uses of
>> VectorMaskCmp is greater than 1, because this optimization may not be
>> worthwhile. For example:
>>
>>
>> public static void testVectorMaskCmp() {
>> IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0);
>> IntVector av = IntVector.fromArray(I_SPECIES, ia, 0);
>> VectorMask<Integer> m1 = av.compare(VectorOperators.NE, bv); // two uses
>> VectorMask<Integer> m2 =m1.not();
>> m1.intoArray(m, 0);
>> av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0);
>> }
>>
>>
>> If we do not check outcnt and still do this optimization, two VectorMaskCmp
>> nodes will be generated, and finally two VectorMaskCmp instructions will be
>> generated. This is unreasonable because VectorMaskCmp has much higher
>> latency than xor instruction on aarch64.
>
> Thanks, we can add this comment to the code where we are checking outcnt.
> What if all the other users are also XorNodes?.
At present, you are checking for one XOR user; shouldn't it be all or one
scenario?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077378879