On Mon, May 20, 2024 at 1:50 PM Tamar Christina <tamar.christ...@arm.com> wrote: > > Hi Pan, > > > -----Original Message----- > > From: pan2...@intel.com <pan2...@intel.com> > > Sent: Monday, May 20, 2024 12:01 PM > > To: gcc-patches@gcc.gnu.org > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > > <tamar.christ...@arm.com>; richard.guent...@gmail.com; Pan Li > > <pan2...@intel.com> > > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD > > > > From: Pan Li <pan2...@intel.com> > > > > This patch would like to support the branch form for unsigned > > SAT_ADD. For example as below: > > > > uint64_t > > sat_add (uint64_t x, uint64_t y) > > { > > return (uint64_t) (x + y) >= x ? (x + y) : -1; > > } > > > > Different to the branchless version, we leverage the simplify to > > convert the branch version of SAT_ADD into branchless if and only > > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > > the ability to choose branch or branchless implementation of .SAT_ADD. > > For example, some target can take care of branches code more optimally. > > > > When the target implement the IFN_SAT_ADD for unsigned and before this > > patch: > > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y) > > { > > long unsigned int _1; > > uint64_t _2; > > __complex__ long unsigned int _6; > > long unsigned int _7; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _6 = .ADD_OVERFLOW (x_3(D), y_4(D)); > > _1 = REALPART_EXPR <_6>; > > _7 = IMAGPART_EXPR <_6>; > > if (_7 == 0) > > goto <bb 4>; [65.00%] > > else > > goto <bb 3>; [35.00%] > > ;; succ: 4 > > ;; 3 > > > > ;; basic block 3, loop depth 0 > > ;; pred: 2 > > ;; succ: 4 > > > > ;; basic block 4, loop depth 0 > > ;; pred: 3 > > ;; 2 > > # _2 = PHI <18446744073709551615(3), _1(2)> > > return _2; > > ;; succ: EXIT > > > > } > > > > After this patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > > { > > long unsigned int _9; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call] > > return _9; > > ;; succ: EXIT > > } > > > > The below test suites are passed for this patch: > > * The x86 bootstrap test. > > * The x86 fully regression test. > > * The riscv fully regression test. > > > > gcc/ChangeLog: > > > > * match.pd: Add new simplify to convert branch SAT_ADD into > > branchless, if and only if backend implement the IFN. > > > > Signed-off-by: Pan Li <pan2...@intel.com> > > --- > > gcc/match.pd | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 0f9c34fa897..0547b57b3a3 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (match (unsigned_integer_sat_add @0 @1) > > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > > > +#if GIMPLE > > + > > +/* Simplify the branch version of SAT_ADD into branchless if and only if > > + the backend has supported the IFN_SAT_ADD. Thus, the backend has the > > + ability to choose branch or branchless implementation of .SAT_ADD. */
This comment or part of the description above should say this simplifies (x + y) >= x ? (x + y) : -1 as (x + y) | (-(typeof(x))((x + y) < x)) > > +(simplify > > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep) > > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > > OPTIMIZE_FOR_BOTH)) > > + (bit_ior @2 (negate (convert (lt @2 @0)))))) > > + > > +(simplify > > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep) > > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > > OPTIMIZE_FOR_BOTH)) > > + (bit_ior @2 (negate (convert (lt @2 @0)))))) and this should probably be (gt @2 @0)? This misses INTEGER_TYPE_P constraints and it's supposed to be only for TYPE_UNSIGNED? > > + > > +#endif > > Thanks, this looks good to me! > > I'll leave it up to Richard to approve, > Richard: The reason for the direct_internal_fn_supported_p is because some > targets said that they currently handle the branch version better due to the > lack > of some types. At the time I reason it's just a target expansion bug but > didn't hear anything. > > To be honest, it feels to me like we should do this unconditionally, and just > have the targets > that get faster branch version to handle it during expand? Since the patch > series provides > a canonicalized version now. I'm not sure this is a good canonical form. __imag .ADD_OVERFLOW (x, y) ? __real .ADD_OVERFLOW (x, y) : -1 would be better IMO. It can be branch-less by using a COND_EXPR. > This means we can also better support targets that have the vector optab but > not the scalar one > as the above check would fail for these targets. > > What do you think? > > Thanks, > Tamar > > > + > > /* x > y && x != XXX_MIN --> x > y > > x > y && x == XXX_MIN --> false . */ > > (for eqne (eq ne) > > -- > > 2.34.1 >