On Mon, May 20, 2024 at 1:50 PM Tamar Christina <tamar.christ...@arm.com> wrote:
>
> Hi Pan,
>
> > -----Original Message-----
> > From: pan2...@intel.com <pan2...@intel.com>
> > Sent: Monday, May 20, 2024 12:01 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > <tamar.christ...@arm.com>; richard.guent...@gmail.com; Pan Li
> > <pan2...@intel.com>
> > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> >
> > From: Pan Li <pan2...@intel.com>
> >
> > This patch would like to support the branch form for unsigned
> > SAT_ADD.  For example as below:
> >
> > uint64_t
> > sat_add (uint64_t x, uint64_t y)
> > {
> >   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> > }
> >
> > Different to the branchless version,  we leverage the simplify to
> > convert the branch version of SAT_ADD into branchless if and only
> > if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> > the ability to choose branch or branchless implementation of .SAT_ADD.
> > For example,  some target can take care of branches code more optimally.
> >
> > When the target implement the IFN_SAT_ADD for unsigned and before this
> > patch:
> > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   uint64_t _2;
> >   __complex__ long unsigned int _6;
> >   long unsigned int _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
> >   _1 = REALPART_EXPR <_6>;
> >   _7 = IMAGPART_EXPR <_6>;
> >   if (_7 == 0)
> >     goto <bb 4>; [65.00%]
> >   else
> >     goto <bb 3>; [35.00%]
> > ;;    succ:       4
> > ;;                3
> >
> > ;;   basic block 3, loop depth 0
> > ;;    pred:       2
> > ;;    succ:       4
> >
> > ;;   basic block 4, loop depth 0
> > ;;    pred:       3
> > ;;                2
> >   # _2 = PHI <18446744073709551615(3), _1(2)>
> >   return _2;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _9;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
> >   return _9;
> > ;;    succ:       EXIT
> > }
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
> >
> > gcc/ChangeLog:
> >
> >       * match.pd: Add new simplify to convert branch SAT_ADD into
> >       branchless,  if and only if backend implement the IFN.
> >
> > Signed-off-by: Pan Li <pan2...@intel.com>
> > ---
> >  gcc/match.pd | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 0f9c34fa897..0547b57b3a3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +#if GIMPLE
> > +
> > +/* Simplify the branch version of SAT_ADD into branchless if and only if
> > +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> > +   ability to choose branch or branchless implementation of .SAT_ADD.  */

This comment or part of the description above should say this simplifies

   (x + y) >= x ? (x + y) : -1

as

  (x + y) | (-(typeof(x))((x + y) < x))

> > +(simplify
> > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))))))
> > +
> > +(simplify
> > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> > +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> > OPTIMIZE_FOR_BOTH))
> > +   (bit_ior @2 (negate (convert (lt @2 @0))))))

and this should probably be (gt @2 @0)?

This misses INTEGER_TYPE_P constraints and it's supposed to be only
for TYPE_UNSIGNED?

> > +
> > +#endif
>
> Thanks, this looks good to me!
>
> I'll leave it up to Richard to approve,
> Richard: The reason for the direct_internal_fn_supported_p is because some
> targets said that they currently handle the branch version better due to the 
> lack
> of some types.  At the time I reason it's just a target expansion bug but 
> didn't hear anything.
>
> To be honest, it feels to me like we should do this unconditionally, and just 
> have the targets
> that get faster branch version to handle it during expand? Since the patch 
> series provides
> a canonicalized version now.

I'm not sure this is a good canonical form.

__imag .ADD_OVERFLOW (x, y) ? __real .ADD_OVERFLOW (x, y) : -1

would be better IMO.  It can be branch-less by using a COND_EXPR.

> This means we can also better support targets that have the vector optab but 
> not the scalar one
> as the above check would fail for these targets.
>
> What do you think?
>
> Thanks,
> Tamar
>
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > --
> > 2.34.1
>

Reply via email to