Re: [AArch64] Fix integer vabs intrinsics

Richard Biener Mon, 05 May 2014 01:04:59 -0700

On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearn...@arm.com> wrote:
> On 02/05/14 11:28, James Greenhalgh wrote:
>> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote:
>>>
>>>
>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> 
>>>> wrote:
>>>>
>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>> <james.greenha...@arm.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave 
>>>>>> as
>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>
>>>>>
>>>>> Only for signed integer types.  You should be able to use an unsigned
>>>>> integer type here instead.
>>>>
>>>> If anything, I think that puts us in a worse position.
>>>
>>> Not if you cast it back.
>>>
>>>
>>>> The issue that
>>>> inspires this patch is that GCC will happily fold:
>>>>
>>>>  t1 = ABS_EXPR (x)
>>>>  t2 = GE_EXPR (t1, 0)
>>>>
>>>> to
>>>>
>>>>  t2 = TRUE
>>>>
>>>> Surely an unsigned integer type is going to suffer the same fate? 
>>>> Certainly I
>>>> can imagine somewhere in the compiler there being a fold path for:
>>>
>>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>>> optimize that. If it does it is a bug since the overflow is defined there.
>>
>> I'm not sure I understand, are you saying I want to fold to:
>>
>>   t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>   t2 = ABS_EXPR (t1)
>>   t3 = VIEW_CONVERT_EXPR (t2, signed)
>>
>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>> other out leading to an overall NOP? It might just be Friday morning and a
>> lack of coffee talking, but I think I need you to spell this one out to
>> me in big letters!
>>
>
> I agree.  I think what you need is a type widening so that you get
>
> t1 = VEC_WIDEN (x)
> t2 = ABS_EXPR (t1)
> t3 = VEC_NARROW (t2)
>
> This then guarantees that the ABS expression cannot be undefined.  I'm
> less sure, however about the narrow causing a change in 'sign'.  Has it
> just punted the problem?  Maybe you need


Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
That is, have an ABS_EXPR that doesn't have the undefined issue
(at expense of optimization in case the result is immediately casted
back to signed)

Richard.

>
> t1 = VEC_WIDEN (x)
> t2 = ABS_EXPR (t1)
> t3 = VIEW_CONVERT_EXPR (x, unsigned)
> t4 = VEC_NARROW (t3)
> t5 = VIEW_CONVERT_EXPR (t4, signed)
>
> !!!
>
> How you capture this into RTL during expand, though, is another thing.
>
> R.
>
>>>>
>>>>  (unsigned >= 0) == TRUE
>>>>
>>>>>>
>>>>>>  a = vabs_s8 (vdup_n_s8 (-128));
>>>>>>  assert (a >= 0);
>>>>>>
>>>>>> does not hold. As in hardware
>>>>>>
>>>>>>  abs (-128) == -128
>>>>>>
>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should 
>>>>>> avoid
>>>>>> it. In fact, we have to be even more careful than that, and keep the 
>>>>>> integer
>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>
>>>>> No it is not.  The mistake is to use signed integer types here.  Just
>>>>> add a conversion to an unsigned integer vector and it will work
>>>>> correctly.
>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>
>>>> Here we are covering ourselves against a seperate issue. For 
>>>> auto-vectorized
>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>>
>>>>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>
>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>
>>> This sounds like it would problematic for unsigned types  and not just for
>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>
>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>> with unsigned types. Further, I have never thought of RTL having signed
>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>> intrinsic version of vabd_s8 - but we'll want to specify the
>>
>>   (abs (minus (reg) (reg)))
>>
>> behaviour so that auto-vectorized code can pick it up.
>>
>> So in the end we'll have these patterns:
>>
>>   (abs
>>     (abs (reg)))
>>
>>   (intrinsic_abs
>>     (unspec [(reg)] UNSPEC_ABS))
>>
>>   (abd
>>     (abs (minus (reg) (reg))))
>>
>>   (intrinsic_abd
>>     (unspec [(reg) (reg)] UNSPEC_ABD))
>>
>>   (aba
>>     (plus (abs (minus (reg) (reg))) (reg)))
>>
>>   (intrinsic_aba
>>     (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>
>> which should give us reasonable auto-vectorized code without triggering any
>> of the issues mapping the semantics of the instructions to intrinsics.
>>
>> Thanks,
>> James
>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>> Thanks,
>>>> James
>>>
>>
>
>

Re: [AArch64] Fix integer vabs intrinsics

Reply via email to