On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2...@intel.com> wrote:
>
> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = 
> phi_cond ? _1 : 255.
> And then we can do the matching on COND_EXPR in the underlying widen-mul pass.
>
> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass =>
> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, 
> have ‘exceptional’ (error_mark) in useless_type_conversion_p, at 
> gimple-expr.cc:86

Likely you have released _2, more comments below on your previous mail.

> will go on to see if this works or not.
>
> Part-A:
> uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   uint8_t _2;
>
>   <bb 2> :
>   _1 = x_3(D) + y_4(D);
>   if (_1 >= x_3(D))
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
>
>   <bb 3> :
>
>   <bb 4> :
>   # _2 = PHI <255(2), _1(3)>
>   return _2;
>
> }
>
> Part-B:
> uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y)
> {
>   unsigned char _1;
>   _Bool phi_cond_6;
>
>   <bb 2> :
>   _1 = x_3(D) + y_4(D);
>   phi_cond_6 = _1 >= x_3(D);
>   _2 = phi_cond_6 ? _1 : 255;
>   return _2;
>
> }
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Thursday, May 23, 2024 12:17 PM
> To: Richard Biener <richard.guent...@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com; pins...@gmail.com
> Subject: RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
> unsigned SAT_ADD
>
> Thanks Richard for reviewing.
>
> > I'm not convinced we should match this during early if-conversion, should 
> > we?
> > The middle-end doesn't really know .SAT_ADD but some handling of
> > .ADD_OVERFLOW is present.
>
> I tried to do the branch (aka cond) match in widen-mult pass similar as 
> previous branchless form.
> Unfortunately, the branch will be converted to PHI when widen-mult, thus try 
> to bypass the PHI handling
> and convert the branch form to the branchless form in v2.
>
> > But please add a comment before the new pattern, esp. since it's
> > non-obvious that this is an improvent.
>
> Sure thing.
>
> > I suspect you rely on this form being recognized as .SAT_ADD later but
> > what prevents us from breaking this?  Why not convert it to .SAT_ADD
> > immediately?  If this is because the ISEL pass (or the widen-mult pass)
> > cannot handle PHIs then I would suggest to split out enough parts of
> > tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.
>
> Yes, this is sort of redundant, we can also convert it to .SAT_ADD 
> immediately in match.pd before widen-mult.
>
> Sorry I may get confused here, for branch form like below, what transform 
> should we perform in phiopt?
> The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we 
> may hit the simplify in the
> other early pass.
>
> Or we can leverage branch version of unsigned_integer_sat_add gimple match in 
> phiopt and generate the gimple call .SAT_ADD
> In phiopt (mostly like what we do in widen-mult).
> Not sure if my understanding is correct or not, thanks again for help.

The trick for widen-mult (or ISEL) would be to try to match the PHI
nodes in a similar way as to
gimple_simplify_phiopt calls op.resimplify.  The difficulty resides in
that the (match ...) generated
code gets the entry to the stmt root.  Either we'd teach genmatch to
recognize a PHI def
as a COND or we make (match ..) (additionally?) generate entry points
taking a gimple_match_op
so the toplevel COND trick works.  Note it's already a bit awkward
because we build a GENERIC
form of the condition and that's now invalid in the IL for a GIMPLE
COND_EXPR but still present
because of that phiopt trick.  There isn't a SSA def for the condition
in the IL (it's only part
of a GIMPLE_COND and that one doesn't define "CC flags").

That means possibly special-casing (match (..) (cond (cmp ...) ..)) in
genmatch to handle
PHI defs might be the easiest "trick" here.

Not sure what you did for the IL you quoted above.

Richard.

> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
>
> SAT_ADD_U_1(uint8_t);
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: Wednesday, May 22, 2024 9:14 PM
> To: Li, Pan2 <pan2...@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com; pins...@gmail.com
> Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
> unsigned SAT_ADD
>
> On Wed, May 22, 2024 at 3:17 AM <pan2...@intel.com> wrote:
> >
> > From: Pan Li <pan2...@intel.com>
> >
> > This patch would like to support the __builtin_add_overflow branch form for
> > unsigned SAT_ADD.  For example as below:
> >
> > uint64_t
> > sat_add (uint64_t x, uint64_t y)
> > {
> >   uint64_t ret;
> >   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> > }
> >
> > Different to the branchless version,  we leverage the simplify to
> > convert the branch version of SAT_ADD into branchless if and only
> > if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> > the ability to choose branch or branchless implementation of .SAT_ADD.
> > For example,  some target can take care of branches code more optimally.
> >
> > When the target implement the IFN_SAT_ADD for unsigned and before this
> > patch:
> >
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   long unsigned int _2;
> >   uint64_t _3;
> >   __complex__ long unsigned int _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
> >   _2 = IMAGPART_EXPR <_6>;
> >   if (_2 != 0)
> >     goto <bb 4>; [35.00%]
> >   else
> >     goto <bb 3>; [65.00%]
> > ;;    succ:       4
> > ;;                3
> >
> > ;;   basic block 3, loop depth 0
> > ;;    pred:       2
> >   _1 = REALPART_EXPR <_6>;
> > ;;    succ:       4
> >
> > ;;   basic block 4, loop depth 0
> > ;;    pred:       3
> > ;;                2
> >   # _3 = PHI <_1(3), 18446744073709551615(2)>
> >   return _3;
> > ;;    succ:       EXIT
> > }
> >
> > After this patch:
> > uint64_t sat_add (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _12;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
> >   return _12;
> > ;;    succ:       EXIT
> > }
> >
> > The below test suites are passed for this patch:
> > * The x86 bootstrap test.
> > * The x86 fully regression test.
> > * The riscv fully regression test.
>
> I'm not convinced we should match this during early if-conversion, should we?
> The middle-end doesn't really know .SAT_ADD but some handling of
> .ADD_OVERFLOW is present.
>
> But please add a comment before the new pattern, esp. since it's
> non-obvious that this is an improvent.
>
> I suspect you rely on this form being recognized as .SAT_ADD later but
> what prevents us from breaking this?  Why not convert it to .SAT_ADD
> immediately?  If this is because the ISEL pass (or the widen-mult pass)
> cannot handle PHIs then I would suggest to split out enough parts of
> tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs.
>
> > gcc/ChangeLog:
> >
> >         * match.pd: Add new simplify to convert branch SAT_ADD into
> >         branchless,  if and only if backend implement the IFN.
> >
> > Signed-off-by: Pan Li <pan2...@intel.com>
> > ---
> >  gcc/match.pd | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index cff67c84498..2dc77a46e67 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +#if GIMPLE
> > +
> > +(simplify
> > + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop)
> > +  integer_minus_onep (realpart @2))
> > + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, type, 
> > OPTIMIZE_FOR_BOTH))
> > +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))))))
> > +
> > +#endif
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > --
> > 2.34.1
> >

Reply via email to