Re: [AArch64] Fix integer vabs intrinsics

Richard Earnshaw Fri, 02 May 2014 03:40:33 -0700

On 02/05/14 11:28, James Greenhalgh wrote:
> On Fri, May 02, 2014 at 10:29:06AM +0100, [email protected] wrote:
>>
>>
>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <[email protected]> 
>>> wrote:
>>>
>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>
>>>>
>>>> Only for signed integer types.  You should be able to use an unsigned
>>>> integer type here instead.
>>>
>>> If anything, I think that puts us in a worse position.
>>
>> Not if you cast it back. 
>>
>>
>>> The issue that
>>> inspires this patch is that GCC will happily fold:
>>>
>>>  t1 = ABS_EXPR (x)
>>>  t2 = GE_EXPR (t1, 0)
>>>
>>> to
>>>
>>>  t2 = TRUE
>>>
>>> Surely an unsigned integer type is going to suffer the same fate? Certainly 
>>> I
>>> can imagine somewhere in the compiler there being a fold path for:
>>
>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>> optimize that. If it does it is a bug since the overflow is defined there. 
> 
> I'm not sure I understand, are you saying I want to fold to:
> 
>   t1 = VIEW_CONVERT_EXPR (x, unsigned)
>   t2 = ABS_EXPR (t1)
>   t3 = VIEW_CONVERT_EXPR (t2, signed)
> 
> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
> other out leading to an overall NOP? It might just be Friday morning and a
> lack of coffee talking, but I think I need you to spell this one out to
> me in big letters!
>


I agree.  I think what you need is a type widening so that you get

t1 = VEC_WIDEN (x)
t2 = ABS_EXPR (t1)
t3 = VEC_NARROW (t2)

This then guarantees that the ABS expression cannot be undefined.  I'm
less sure, however about the narrow causing a change in 'sign'.  Has it
just punted the problem?  Maybe you need


t1 = VEC_WIDEN (x)
t2 = ABS_EXPR (t1)
t3 = VIEW_CONVERT_EXPR (x, unsigned)
t4 = VEC_NARROW (t3)
t5 = VIEW_CONVERT_EXPR (t4, signed)

!!!

How you capture this into RTL during expand, though, is another thing.

R.

>>>
>>>  (unsigned >= 0) == TRUE
>>>
>>>>>
>>>>>  a = vabs_s8 (vdup_n_s8 (-128));
>>>>>  assert (a >= 0);
>>>>>
>>>>> does not hold. As in hardware
>>>>>
>>>>>  abs (-128) == -128
>>>>>
>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should 
>>>>> avoid
>>>>> it. In fact, we have to be even more careful than that, and keep the 
>>>>> integer
>>>>> vabs intrinsics as an unspec in the back end.
>>>>
>>>> No it is not.  The mistake is to use signed integer types here.  Just
>>>> add a conversion to an unsigned integer vector and it will work
>>>> correctly.
>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>
>>> Here we are covering ourselves against a seperate issue. For auto-vectorized
>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>
>>>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>
>>> So in this case, the combine would be erroneous. Likewise SABA.
>>
>> This sounds like it would problematic for unsigned types  and not just for
>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>> instead. Since in rtl overflow and underflow is defined to be wrapping. 
> 
> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
> with unsigned types. Further, I have never thought of RTL having signed
> and unsigned types, just a bag of bits. We'll want to use unspec for the
> intrinsic version of vabd_s8 - but we'll want to specify the
> 
>   (abs (minus (reg) (reg)))
> 
> behaviour so that auto-vectorized code can pick it up.
> 
> So in the end we'll have these patterns:
> 
>   (abs
>     (abs (reg)))
> 
>   (intrinsic_abs
>     (unspec [(reg)] UNSPEC_ABS))
> 
>   (abd
>     (abs (minus (reg) (reg))))
> 
>   (intrinsic_abd
>     (unspec [(reg) (reg)] UNSPEC_ABD))
> 
>   (aba
>     (plus (abs (minus (reg) (reg))) (reg)))
> 
>   (intrinsic_aba
>     (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
> 
> which should give us reasonable auto-vectorized code without triggering any
> of the issues mapping the semantics of the instructions to intrinsics.
> 
> Thanks,
> James
> 
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>> Thanks,
>>> James
>>
>

Re: [AArch64] Fix integer vabs intrinsics

Reply via email to