Thanks Richard for comments.
Assume we talk about the form 7 as below:
#define DEF_SAT_U_MUL_FMT_7(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
NT max = -1; \
bool overflow_p = x > (WT)(max); \
return -(NT)(overflow_p) | (NT)x; \
}
> So why is matching the conversion necessary at all, that is, why is (mult @0
> @1)
> not sufficient here?
Because there are many different types combinations, unsigned SAT_MUL need the
help of a wider type.
For example:
If NT is uint32_t, WT is uint128_t, we need the convert before widen_mul:
21 │ _12 = (unsigned long) a_6(D);
22 │ _13 = (unsigned long) b_7(D);
23 │ x_8 = _12 w* _13;
If NT is uint64_t, WT is uint128_t, we don't have the convert.
19 │ x_8 = a_6(D) w* b_7(D);
> You are not looking at the types of @0 or @1 at
> all, AFAICS
> one could be 'char' and one could be 'short'?
The type is restricted by (if (types_match (type, @0, @1)). Aka the result must
have the same type as @0 and @1.
> Captures on optional nodes are "merged" to the operand, so for (convert?@4 @0)
> when there is no conversion @4 == @0. That is it behaves as if there were two
> captures at the place of @0.
I see, sounds like alias here, thanks for the explanation.
> But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
> desired type?
> That is, I wonder if even when the conversion is missing, the
> saturated operation
> can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ
> produces a uint8_t (in this case)? Not sure I phrased this in an
> understandable way ;)
> I'll note that the outer conversion is poorly constrained given the conversion
> around the multiplication/inside the negate is what determines the
> semantics of it
> and that's not constrained either?
For NT is uint8_t and WT is uint128_t, we have the final convert from gimple.
The bigger types mentioned in above
don't have such convert. The _11 and _6 should be almost the same, maybe
nop_convert here is good enough if it failed
to match that. I will have a try and keep you posted.
14 │ signed char _6;
15 │ uint8_t _11;
...
26 │ _3 = (signed char) overflow_p_10;
27 │ _4 = -_3;
28 │ _5 = (signed char) x_9;
29 │ _6 = _4 | _5;
30 │ _11 = (uint8_t) _6;
31 │ return _11;
Pan
-----Original Message-----
From: Richard Biener <[email protected]>
Sent: Thursday, October 23, 2025 7:08 PM
To: Li, Pan2 <[email protected]>
Cc: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]; Chen, Ken <[email protected]>;
Liu, Hongtao <[email protected]>; [email protected]
Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
On Wed, Oct 22, 2025 at 2:21 PM Li, Pan2 <[email protected]> wrote:
>
> Thanks Richard for comments.
>
> > I'm a bit confused as to this for and the explicit widen_mult case in the
> > 2nd
> > pattern below. In fact, the patterns look similar enough to me and I wonder
> > whether the consumer can handle the outer conversion and the conversion
> > of the leafs of the multiplication? The conversion of the leafs doesn't
> > have
> > any constraints, nor is it obvious why there's an outer conversion around
> > the
> > IOR.
>
> The two pattern looks similar up to a point. I separate them into 2 mostly
> comes from
> the inner convert about mult, aka (mult (convert@4 @0) (convert@5 @1)). Some
> diff
> type combination need to convert while others are not.
>
> I tried to leverage something like (mult (convert?@4 @0) (convert?@5 @1)) to
> put
> them together. But it may has additional sematics because add '?' after
> convert will
> cover (mult (convert@4 @0) @1) which is not expected.
So why is matching the conversion necessary at all, that is, why is (mult @0 @1)
not sufficient here? You are not looking at the types of @0 or @1 at
all, AFAICS
one could be 'char' and one could be 'short'?
> Another problem about add '?' after convert is capture @4/@5, if there is no
> need
> to convert, I am not sure how to take care of the @4 from the “with“” scope
> because it
> is optional. The code acts on capture @4/@5 is not correct when there is no
> convert.
Captures on optional nodes are "merged" to the operand, so for (convert?@4 @0)
when there is no conversion @4 == @0. That is it behaves as if there were two
captures at the place of @0.
> I may have another try to merge them into one, but is there any suggestion
> here?
>
> > outer conversion around the
> > IOR.
>
> Some type(s) may need to do a convert to the final type, like
> sat_u_mul-8-u8-from-u32,
> The seq may look like blow.
>
> 25 │ _6 = _4 | _5;
> 26 │ _11 = (uint8_t) _6;
> 27 │ return _11;
But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the
desired type?
That is, I wonder if even when the conversion is missing, the
saturated operation
can be matched by using (original-type)SAT_UXYZ (...), implying the SAT_UXYZ
produces a uint8_t (in this case)? Not sure I phrased this in an
understandable way ;)
I'll note that the outer conversion is poorly constrained given the conversion
around the multiplication/inside the negate is what determines the
semantics of it
and that's not constrained either?
Richard.
> Pan
>
> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Wednesday, October 22, 2025 5:00 PM
> To: Li, Pan2 <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Chen, Ken <[email protected]>;
> Liu, Hongtao <[email protected]>; [email protected]
> Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7
>
> On Mon, Oct 20, 2025 at 3:10 PM <[email protected]> wrote:
> >
> > From: Pan Li <[email protected]>
> >
> > This patch would like to try to match the the unsigned
> > SAT_MUL form 7, aka below:
> >
> > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \
> > NT __attribute__((noinline)) \
> > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
> > { \
> > WT x = (WT)a * (WT)b; \
> > NT max = -1; \
> > bool overflow_p = x > (WT)(max); \
> > return -(NT)(overflow_p) | (NT)x; \
> > }
> >
> > while WT is uint128_t, uint64_t, uint32_t and uint16_t, and
> > NT is uint64_t, uint32_t, uint16_t or uint8_t.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add pattern for SAT_MUL form 5 include
> > mul and widen_mul.
> >
> > Signed-off-by: Pan Li <[email protected]>
> > ---
> > gcc/match.pd | 35 +++++++++++++++++++++++++++++++++++
> > 1 file changed, 35 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index bfc51e6579a..0f55a82d989 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3749,6 +3749,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > bool widen_mult_p = prec * 2 == widen_prec;
> > }
> > (if (c2_is_type_precision_p && widen_mult_p)))))
> > + (for mult_op (mult widen_mult)
>
> I'm a bit confused as to this for and the explicit widen_mult case in the 2nd
> pattern below. In fact, the patterns look similar enough to me and I wonder
> whether the consumer can handle the outer conversion and the conversion
> of the leafs of the multiplication? The conversion of the leafs doesn't have
> any constraints, nor is it obvious why there's an outer conversion around the
> IOR.
>
> > + (match (unsigned_integer_sat_mul @0 @1)
> > + (convert?
> > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1)))))
> > + (if (types_match (type, @0, @1))
> > + (with
> > + {
> > + unsigned prec = TYPE_PRECISION (type);
> > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > + unsigned cvt4_prec = TYPE_PRECISION (TREE_TYPE (@4));
> > + unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
> > +
> > + wide_int max = wi::mask (prec, false, widen_prec);
> > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > + bool widen_mult_p = mult_op == WIDEN_MULT_EXPR && cvt4_prec ==
> > cvt5_prec
> > + && widen_prec == cvt5_prec * 2;
> > + bool mult_p = mult_op == MULT_EXPR && cvt4_prec == cvt5_prec
> > + && cvt4_prec == widen_prec && widen_prec > prec;
> > + }
> > + (if (c2_is_max_p && (mult_p || widen_mult_p)))))))
> > + (match (unsigned_integer_sat_mul @0 @1)
> > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2)))
> > + (convert (widen_mult:c@3 @0 @1)))
> > + (if (types_match (type, @0, @1))
> > + (with
> > + {
> > + unsigned prec = TYPE_PRECISION (type);
> > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
> > +
> > + wide_int max = wi::mask (prec, false, widen_prec);
> > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max);
> > + bool widen_mult_p = prec * 2 == widen_prec;
> > + }
> > + (if (c2_is_max_p && widen_mult_p)))))
> > )
> >
> > /* The boundary condition for case 10: IMM = 1:
> > --
> > 2.43.0
> >