On Mon, Nov 24, 2025 at 3:01 PM Li, Pan2 <[email protected]> wrote: > > Hi Richard, > > The pre-refactor of this series completed, do you prefer to continue the v2 > now or defer to the next stage 1? > Both works for me, and thanks a lot.
You can send v2 now. Richard. > Pan > > -----Original Message----- > From: Li, Pan2 <[email protected]> > Sent: Saturday, November 1, 2025 10:33 AM > To: Richard Biener <[email protected]> > Cc: [email protected]; [email protected]; [email protected]; > [email protected]; [email protected]; Chen, Ken <[email protected]>; > Liu, Hongtao <[email protected]>; [email protected] > Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > Thanks Richard for comments. > > > Thanks and sorry for the delay... > > Never mind, I see you were busy for middle-end vectorization changes recently. > > > OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. > > widen_mult. > > IMO this would be perfect for a widening_mult helper? > > Sure thing, seems also benefits other SAT_MUL pattern(s) introduced in > previous too. So I may try to refactor that first. > > > I see. That's odd. In fact I'd say we'd want to transform > > (nop_convert (bit_{ior,and,xor}:c (convert @0) @1)) > > to move the conversion inside like > > > (for op (bit_ior bit_and bit_xor) > > (simplify > > (nop_convert (op:c (convert @0) @1)) > > (op:type (convert @0) (convert @1)))) > > That make much sense to me, and we have similar scenario for the SAT_MUL > pattern in previous, will refactor this first. > > Pan > > -----Original Message----- > From: Richard Biener <[email protected]> > Sent: Friday, October 31, 2025 9:37 PM > To: Li, Pan2 <[email protected]> > Cc: [email protected]; [email protected]; [email protected]; > [email protected]; [email protected]; Chen, Ken <[email protected]>; > Liu, Hongtao <[email protected]>; [email protected] > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > On Tue, Oct 28, 2025 at 3:31 AM Li, Pan2 <[email protected]> wrote: > > > > Thanks Richard for comments. > > > > > (match > > > (widening_mult @0 @1) > > > (widen_mult @0 @1)) > > > (match > > > (widening_mult @0 @1) > > > (mult (convert @0) (convert @1)) > > > (if (it is widening)) > > > > I see, that wrap the convert part into another helper predicate. That make > > sense to me, and will have a try soon. > > I think the difference comes from the the uint128_t/uint64_t has different > > gimple compares to the rest. > > > > 1. WT uint128_t, NT uint64_t, we have gimple as below. > > > > 8 │ uint128_t x; > > 19 │ x_8 = a_6(D) w* b_7(D); > > > > The NT (aka uint64_t) doesn't need to convert when widen_mul to uint128_t. > > > > 2. WT uint128_t, NT uint32_t, as well as the rest combination if WT > NT. > > > > 8 │ uint128_t x; > > 21 │ _12 = (unsigned long) a_6(D); > > 22 │ _13 = (unsigned long) b_7(D); > > 23 │ x_8 = _12 w* _13; > > > > The NT (aka uint32_t) need to convert to uint64_t before widen_mul to > > uint128_t. > > > > In previous, I found it is not easy to merge the 2 case into one expression. > > Thus, add one pattern for case 1 and the other for case 2. > > OK, so this is basically widen + mult vs. pre-widen + widen_mult vs. > widen_mult. > IMO this would be perfect for a widening_mult helper? > > > > Now for the question - part of the pattern is > > > (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult > > > ...))) > > > I wonder whether you've seen the outer conversion being moved inside the > > > IOR? > > > At least I wonder why there's the (convert ...) around the widening_mult > > > and > > > why that's not conditional? The textual description of the pattern > > > mentions > > > (NT)x | (NT)overflow_p, so the result should already be in NT, no > > > outer conversion > > > required? That said, (NT)((WT)x | (WT)overflow_p) would be equally valid, > > > but we fold that inside the IOR? But not always? > > > > Some of the cases introduces signed type for some intermediate result, like > > WT is uint16_t, NT is uint8_t. > > > > 11 │ signed char _3; // we have signed char from _3 to _6. > > 12 │ signed char _4; > > 13 │ signed char _5; > > 14 │ signed char _6; > > 15 │ uint8_t _11; > > 16 │ > > 17 │ <bb 2> [local count: 1073741824]: > > 18 │ _1 = (short unsigned int) a_7(D); > > 19 │ _2 = (short unsigned int) b_8(D); > > 20 │ x_9 = _1 * _2; > > 21 │ overflow_p_10 = x_9 > 255; > > 22 │ _3 = (signed char) overflow_p_10; > > 23 │ _4 = -_3; > > 24 │ _5 = (signed char) x_9; > > 25 │ _6 = _4 | _5; > > 26 │ _11 = (uint8_t) _6; > > I see. That's odd. In fact I'd say we'd want to transform > (nop_convert (bit_{ior,and,xor}:c (convert @0) @1)) > to move the conversion inside like > > (for op (bit_ior bit_and bit_xor) > (simplify > (nop_convert (op:c (convert @0) @1)) > (op:type (convert @0) (convert @1)))) > > > 27 │ return _11; > > > > When we take a look for WT is uint64_t, NT is uint32_t, we will have > > unsigned. > > > > 11 │ unsigned int _3; > > 12 │ unsigned int _4; > > 13 │ unsigned int _5; > > ... > > 26 │ _6 = _4 | _5; > > > > The outer convert with mark '?' is introduced to match both of the above 2 > > cases. > > Thanks and sorry for the delay... > Richard. > > > Pan > > > > > > -----Original Message----- > > From: Richard Biener <[email protected]> > > Sent: Monday, October 27, 2025 9:14 PM > > To: Li, Pan2 <[email protected]> > > Cc: [email protected]; [email protected]; [email protected]; > > [email protected]; [email protected]; Chen, Ken <[email protected]>; > > Liu, Hongtao <[email protected]>; [email protected] > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > On Sun, Oct 26, 2025 at 11:58 AM Li, Pan2 <[email protected]> wrote: > > > > > > Hi Richard, > > > > > > I had a try for 2 parts those days. > > > > > > 1. Remove outer convert?, it will make the pattern failed to match as the > > > type is signed > > > without that convert, and the types_match (type, @0, @1) to be false. > > > 2. Merge 2 into one pattern with (convert?@4 @0) but it will also result > > > in > > > pattern match fail(when NT is uint32_t and WT is uint64_t). > > > Because the types_match (type, capture[0], capture[1]) will be > > > false, as type is SI and capture is DI. > > > > > > So I think it is possible to separate the common part into a helper > > > pattern like unsigned_sat_mul_helper. > > > And make the real unsigned_integer_sat_mul(@0, @1) for the first one, and > > > unsigned_integer_sat_mul((convert @0), (convert @1) > > > for the second one. > > > Then we don't need to duplicate most of the patterns up to a point. How > > > do you think of it? Thanks a lot. > > > > Maybe we can first get one of the patterns to a form I understand (the > > one with the many conversions)? > > Splitting out parts might only obfuscate things. It would be maybe > > nice to be able to merge different > > alternatives within the same (match ..) sharing both the name and the > > conditions, like > > > > (match (foo @0 @1) > > (convert (mult @0 @1)) > > (widen_mult @0 @1) > > (if (...))) > > > > If you can structure the two patterns in a way that the (if ...) part > > is identical that would help. > > > > Some question: > > > > + (convert? > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2))) > > + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1))))) > > > > So I get that we want a widening multiplication but that's either a > > widen_mult > > already or not. We might want to have a > > > > (match > > (widening_mult @0 @1) > > (widen_mult @0 @1)) > > (match > > (widening_mult @0 @1) > > (mult (convert @0) (convert @1)) > > (if (it is widening)) > > > > to help merging. Now for the question - part of the pattern is > > > > (convert? (bit_ior (negate (convert from-bool)) (convert (widening_mult > > ...))) > > > > I wonder whether you've seen the outer conversion being moved inside the > > IOR? > > At least I wonder why there's the (convert ...) around the widening_mult and > > why that's not conditional? The textual description of the pattern mentions > > (NT)x | (NT)overflow_p, so the result should already be in NT, no > > outer conversion > > required? That said, (NT)((WT)x | (WT)overflow_p) would be equally valid, > > but we fold that inside the IOR? But not always? > > > > Thanks, > > Richard. > > > > > Pan > > > > > > -----Original Message----- > > > From: Li, Pan2 > > > Sent: Friday, October 24, 2025 9:52 AM > > > To: 'Richard Biener' <[email protected]> > > > Cc: [email protected]; [email protected]; [email protected]; > > > [email protected]; [email protected]; Chen, Ken > > > <[email protected]>; Liu, Hongtao <[email protected]>; > > > [email protected] > > > Subject: RE: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > > > Thanks Richard for comments. > > > > > > Assume we talk about the form 7 as below: > > > > > > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \ > > > NT __attribute__((noinline)) \ > > > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \ > > > { \ > > > WT x = (WT)a * (WT)b; \ > > > NT max = -1; \ > > > bool overflow_p = x > (WT)(max); \ > > > return -(NT)(overflow_p) | (NT)x; \ > > > } > > > > > > > So why is matching the conversion necessary at all, that is, why is > > > > (mult @0 @1) > > > > not sufficient here? > > > > > > Because there are many different types combinations, unsigned SAT_MUL > > > need the help of a wider type. > > > For example: > > > > > > If NT is uint32_t, WT is uint128_t, we need the convert before widen_mul: > > > 21 │ _12 = (unsigned long) a_6(D); > > > 22 │ _13 = (unsigned long) b_7(D); > > > 23 │ x_8 = _12 w* _13; > > > > > > If NT is uint64_t, WT is uint128_t, we don't have the convert. > > > 19 │ x_8 = a_6(D) w* b_7(D); > > > > > > > You are not looking at the types of @0 or @1 at > > > > all, AFAICS > > > > one could be 'char' and one could be 'short'? > > > > > > The type is restricted by (if (types_match (type, @0, @1)). Aka the > > > result must have the same type as @0 and @1. > > > > > > > Captures on optional nodes are "merged" to the operand, so for > > > > (convert?@4 @0) > > > > when there is no conversion @4 == @0. That is it behaves as if there > > > > were two > > > > captures at the place of @0. > > > > > > I see, sounds like alias here, thanks for the explanation. > > > > > > > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the > > > > desired type? > > > > That is, I wonder if even when the conversion is missing, the > > > > saturated operation > > > > can be matched by using (original-type)SAT_UXYZ (...), implying the > > > > SAT_UXYZ > > > > produces a uint8_t (in this case)? Not sure I phrased this in an > > > > understandable way ;) > > > > > > > I'll note that the outer conversion is poorly constrained given the > > > > conversion > > > > around the multiplication/inside the negate is what determines the > > > > semantics of it > > > > and that's not constrained either? > > > > > > For NT is uint8_t and WT is uint128_t, we have the final convert from > > > gimple. The bigger types mentioned in above > > > don't have such convert. The _11 and _6 should be almost the same, maybe > > > nop_convert here is good enough if it failed > > > to match that. I will have a try and keep you posted. > > > > > > 14 │ signed char _6; > > > 15 │ uint8_t _11; > > > ... > > > 26 │ _3 = (signed char) overflow_p_10; > > > 27 │ _4 = -_3; > > > 28 │ _5 = (signed char) x_9; > > > 29 │ _6 = _4 | _5; > > > 30 │ _11 = (uint8_t) _6; > > > 31 │ return _11; > > > > > > Pan > > > > > > > > > -----Original Message----- > > > From: Richard Biener <[email protected]> > > > Sent: Thursday, October 23, 2025 7:08 PM > > > To: Li, Pan2 <[email protected]> > > > Cc: [email protected]; [email protected]; [email protected]; > > > [email protected]; [email protected]; Chen, Ken > > > <[email protected]>; Liu, Hongtao <[email protected]>; > > > [email protected] > > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > > > On Wed, Oct 22, 2025 at 2:21 PM Li, Pan2 <[email protected]> wrote: > > > > > > > > Thanks Richard for comments. > > > > > > > > > I'm a bit confused as to this for and the explicit widen_mult case in > > > > > the 2nd > > > > > pattern below. In fact, the patterns look similar enough to me and I > > > > > wonder > > > > > whether the consumer can handle the outer conversion and the > > > > > conversion > > > > > of the leafs of the multiplication? The conversion of the leafs > > > > > doesn't have > > > > > any constraints, nor is it obvious why there's an outer conversion > > > > > around the > > > > > IOR. > > > > > > > > The two pattern looks similar up to a point. I separate them into 2 > > > > mostly comes from > > > > the inner convert about mult, aka (mult (convert@4 @0) (convert@5 @1)). > > > > Some diff > > > > type combination need to convert while others are not. > > > > > > > > I tried to leverage something like (mult (convert?@4 @0) (convert?@5 > > > > @1)) to put > > > > them together. But it may has additional sematics because add '?' after > > > > convert will > > > > cover (mult (convert@4 @0) @1) which is not expected. > > > > > > So why is matching the conversion necessary at all, that is, why is (mult > > > @0 @1) > > > not sufficient here? You are not looking at the types of @0 or @1 at > > > all, AFAICS > > > one could be 'char' and one could be 'short'? > > > > > > > Another problem about add '?' after convert is capture @4/@5, if there > > > > is no need > > > > to convert, I am not sure how to take care of the @4 from the “with“” > > > > scope because it > > > > is optional. The code acts on capture @4/@5 is not correct when there > > > > is no convert. > > > > > > Captures on optional nodes are "merged" to the operand, so for > > > (convert?@4 @0) > > > when there is no conversion @4 == @0. That is it behaves as if there > > > were two > > > captures at the place of @0. > > > > > > > I may have another try to merge them into one, but is there any > > > > suggestion here? > > > > > > > > > outer conversion around the > > > > > IOR. > > > > > > > > Some type(s) may need to do a convert to the final type, like > > > > sat_u_mul-8-u8-from-u32, > > > > The seq may look like blow. > > > > > > > > 25 │ _6 = _4 | _5; > > > > 26 │ _11 = (uint8_t) _6; > > > > 27 │ return _11; > > > > > > But isn't _4 | _5 still a sat_u_mul-8-u8 but with a conversion to the > > > desired type? > > > That is, I wonder if even when the conversion is missing, the > > > saturated operation > > > can be matched by using (original-type)SAT_UXYZ (...), implying the > > > SAT_UXYZ > > > produces a uint8_t (in this case)? Not sure I phrased this in an > > > understandable way ;) > > > > > > I'll note that the outer conversion is poorly constrained given the > > > conversion > > > around the multiplication/inside the negate is what determines the > > > semantics of it > > > and that's not constrained either? > > > > > > Richard. > > > > > > > Pan > > > > > > > > -----Original Message----- > > > > From: Richard Biener <[email protected]> > > > > Sent: Wednesday, October 22, 2025 5:00 PM > > > > To: Li, Pan2 <[email protected]> > > > > Cc: [email protected]; [email protected]; > > > > [email protected]; [email protected]; [email protected]; Chen, > > > > Ken <[email protected]>; Liu, Hongtao <[email protected]>; > > > > [email protected] > > > > Subject: Re: [PATCH v1 1/2] Match: Add unsigned SAT_MUL for form 7 > > > > > > > > On Mon, Oct 20, 2025 at 3:10 PM <[email protected]> wrote: > > > > > > > > > > From: Pan Li <[email protected]> > > > > > > > > > > This patch would like to try to match the the unsigned > > > > > SAT_MUL form 7, aka below: > > > > > > > > > > #define DEF_SAT_U_MUL_FMT_7(NT, WT) \ > > > > > NT __attribute__((noinline)) \ > > > > > sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \ > > > > > { \ > > > > > WT x = (WT)a * (WT)b; \ > > > > > NT max = -1; \ > > > > > bool overflow_p = x > (WT)(max); \ > > > > > return -(NT)(overflow_p) | (NT)x; \ > > > > > } > > > > > > > > > > while WT is uint128_t, uint64_t, uint32_t and uint16_t, and > > > > > NT is uint64_t, uint32_t, uint16_t or uint8_t. > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * match.pd: Add pattern for SAT_MUL form 5 include > > > > > mul and widen_mul. > > > > > > > > > > Signed-off-by: Pan Li <[email protected]> > > > > > --- > > > > > gcc/match.pd | 35 +++++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 35 insertions(+) > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > index bfc51e6579a..0f55a82d989 100644 > > > > > --- a/gcc/match.pd > > > > > +++ b/gcc/match.pd > > > > > @@ -3749,6 +3749,41 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > bool widen_mult_p = prec * 2 == widen_prec; > > > > > } > > > > > (if (c2_is_type_precision_p && widen_mult_p))))) > > > > > + (for mult_op (mult widen_mult) > > > > > > > > I'm a bit confused as to this for and the explicit widen_mult case in > > > > the 2nd > > > > pattern below. In fact, the patterns look similar enough to me and I > > > > wonder > > > > whether the consumer can handle the outer conversion and the conversion > > > > of the leafs of the multiplication? The conversion of the leafs > > > > doesn't have > > > > any constraints, nor is it obvious why there's an outer conversion > > > > around the > > > > IOR. > > > > > > > > > + (match (unsigned_integer_sat_mul @0 @1) > > > > > + (convert? > > > > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2))) > > > > > + (convert (mult_op:c@3 (convert@4 @0) (convert@5 @1))))) > > > > > + (if (types_match (type, @0, @1)) > > > > > + (with > > > > > + { > > > > > + unsigned prec = TYPE_PRECISION (type); > > > > > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3)); > > > > > + unsigned cvt4_prec = TYPE_PRECISION (TREE_TYPE (@4)); > > > > > + unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5)); > > > > > + > > > > > + wide_int max = wi::mask (prec, false, widen_prec); > > > > > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max); > > > > > + bool widen_mult_p = mult_op == WIDEN_MULT_EXPR && cvt4_prec == > > > > > cvt5_prec > > > > > + && widen_prec == cvt5_prec * 2; > > > > > + bool mult_p = mult_op == MULT_EXPR && cvt4_prec == cvt5_prec > > > > > + && cvt4_prec == widen_prec && widen_prec > prec; > > > > > + } > > > > > + (if (c2_is_max_p && (mult_p || widen_mult_p))))))) > > > > > + (match (unsigned_integer_sat_mul @0 @1) > > > > > + (bit_ior (negate (convert (gt @3 INTEGER_CST@2))) > > > > > + (convert (widen_mult:c@3 @0 @1))) > > > > > + (if (types_match (type, @0, @1)) > > > > > + (with > > > > > + { > > > > > + unsigned prec = TYPE_PRECISION (type); > > > > > + unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3)); > > > > > + > > > > > + wide_int max = wi::mask (prec, false, widen_prec); > > > > > + bool c2_is_max_p = wi::eq_p (wi::to_wide (@2), max); > > > > > + bool widen_mult_p = prec * 2 == widen_prec; > > > > > + } > > > > > + (if (c2_is_max_p && widen_mult_p))))) > > > > > ) > > > > > > > > > > /* The boundary condition for case 10: IMM = 1: > > > > > -- > > > > > 2.43.0 > > > > >
