On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearn...@arm.com> wrote: > On 02/05/14 11:28, James Greenhalgh wrote: >> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: >>> >>> >>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> >>>> wrote: >>>> >>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: >>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh >>>>> <james.greenha...@arm.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as >>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave >>>>>> as >>>>>> the hardware. That is to say, the pseudo-code sequence: >>>>> >>>>> >>>>> Only for signed integer types. You should be able to use an unsigned >>>>> integer type here instead. >>>> >>>> If anything, I think that puts us in a worse position. >>> >>> Not if you cast it back. >>> >>> >>>> The issue that >>>> inspires this patch is that GCC will happily fold: >>>> >>>> t1 = ABS_EXPR (x) >>>> t2 = GE_EXPR (t1, 0) >>>> >>>> to >>>> >>>> t2 = TRUE >>>> >>>> Surely an unsigned integer type is going to suffer the same fate? >>>> Certainly I >>>> can imagine somewhere in the compiler there being a fold path for: >>> >>> Yes but if add a cast from the unsigned type to the signed type gcc does not >>> optimize that. If it does it is a bug since the overflow is defined there. >> >> I'm not sure I understand, are you saying I want to fold to: >> >> t1 = VIEW_CONVERT_EXPR (x, unsigned) >> t2 = ABS_EXPR (t1) >> t3 = VIEW_CONVERT_EXPR (t2, signed) >> >> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each >> other out leading to an overall NOP? It might just be Friday morning and a >> lack of coffee talking, but I think I need you to spell this one out to >> me in big letters! >> > > I agree. I think what you need is a type widening so that you get > > t1 = VEC_WIDEN (x) > t2 = ABS_EXPR (t1) > t3 = VEC_NARROW (t2) > > This then guarantees that the ABS expression cannot be undefined. I'm > less sure, however about the narrow causing a change in 'sign'. Has it > just punted the problem? Maybe you need
Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED result type, thus do abs(int) -> unsigned (what we have as absu_hwi). That is, have an ABS_EXPR that doesn't have the undefined issue (at expense of optimization in case the result is immediately casted back to signed) Richard. > > t1 = VEC_WIDEN (x) > t2 = ABS_EXPR (t1) > t3 = VIEW_CONVERT_EXPR (x, unsigned) > t4 = VEC_NARROW (t3) > t5 = VIEW_CONVERT_EXPR (t4, signed) > > !!! > > How you capture this into RTL during expand, though, is another thing. > > R. > >>>> >>>> (unsigned >= 0) == TRUE >>>> >>>>>> >>>>>> a = vabs_s8 (vdup_n_s8 (-128)); >>>>>> assert (a >= 0); >>>>>> >>>>>> does not hold. As in hardware >>>>>> >>>>>> abs (-128) == -128 >>>>>> >>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should >>>>>> avoid >>>>>> it. In fact, we have to be even more careful than that, and keep the >>>>>> integer >>>>>> vabs intrinsics as an unspec in the back end. >>>>> >>>>> No it is not. The mistake is to use signed integer types here. Just >>>>> add a conversion to an unsigned integer vector and it will work >>>>> correctly. >>>>> In fact the ABS rtl code is not undefined for the overflow. >>>> >>>> Here we are covering ourselves against a seperate issue. For >>>> auto-vectorized >>>> code we want the SABD combine patterns to kick in whenever sensible. For >>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: >>>> >>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) >>>> >>>> So in this case, the combine would be erroneous. Likewise SABA. >>> >>> This sounds like it would problematic for unsigned types and not just for >>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 >>> instead. Since in rtl overflow and underflow is defined to be wrapping. >> >> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point >> with unsigned types. Further, I have never thought of RTL having signed >> and unsigned types, just a bag of bits. We'll want to use unspec for the >> intrinsic version of vabd_s8 - but we'll want to specify the >> >> (abs (minus (reg) (reg))) >> >> behaviour so that auto-vectorized code can pick it up. >> >> So in the end we'll have these patterns: >> >> (abs >> (abs (reg))) >> >> (intrinsic_abs >> (unspec [(reg)] UNSPEC_ABS)) >> >> (abd >> (abs (minus (reg) (reg)))) >> >> (intrinsic_abd >> (unspec [(reg) (reg)] UNSPEC_ABD)) >> >> (aba >> (plus (abs (minus (reg) (reg))) (reg))) >> >> (intrinsic_aba >> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) >> >> which should give us reasonable auto-vectorized code without triggering any >> of the issues mapping the semantics of the instructions to intrinsics. >> >> Thanks, >> James >> >>> >>> Thanks, >>> Andrew Pinski >>> >>>> >>>> Thanks, >>>> James >>> >> > >