https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 29 Jan 2018, ktkachov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067 > > --- Comment #6 from ktkachov at gcc dot gnu.org --- > (In reply to rguent...@suse.de from comment #5) > > On Mon, 29 Jan 2018, ktkachov at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067 > > > > > > --- Comment #3 from ktkachov at gcc dot gnu.org --- > > > (In reply to Richard Biener from comment #2) > > > > So any hint on whether the code after r257077 is better or worse than > > > > before? > > > > > > Looks worse unfortunately: > > > For aarch64 at -O2 it generates: > > > foo: > > > mov w3, 44 > > > mov w2, 40 > > > mov w5, 1 > > > mov w4, 2 > > > smull x3, w1, w3 > > > smull x2, w1, w2 > > > str w5, [x0, x3] > > > add x2, x2, 400 > > > add x1, x2, x1, sxtw 2 > > > str w4, [x0, x1] > > > ret > > > > > > whereas with r257077 it generates the shorter: > > > foo: > > > mov w3, 40 > > > sxtw x2, w1 > > > mov w4, 1 > > > smaddl x0, w1, w3, x0 > > > mov w3, 2 > > > add x1, x0, x2, lsl 2 > > > str w4, [x0, x2, lsl 2] > > > str w3, [x1, 400] > > > ret > > > > So shorter is worse? Might be because I don't understand the > > difference between the 'lsl 2' and the 'sxtw 2' or the cost > > of the [x1, 400] addressing. > > Sorry, I messed up the writeup. Let me try again. > The shorter sequence (with the smaddl) is the good one and is produced > *without* r257077. After r257077 we generate the longer and worse sequence > with > two smull. I see the shorter sequence with TOT, r257077 included. The testcase explicitely checks for no widen-mult-plus but we now have two: <bb 2> [local count: 1073741825]: _17 = Idx_6(D) w* 44; _13 = Arr_7(D) + _17; MEM[(int[10] *)_13] = 1; _4 = WIDEN_MULT_PLUS_EXPR <Idx_6(D), 40, 400>; _18 = WIDEN_MULT_PLUS_EXPR <Idx_6(D), 4, _4>; _16 = Arr_7(D) + _18; MEM[(int[10] *)_16] = 2; return; note the "shorter" sequence I see is foo: mov x4, 400 mov w3, 40 mov w2, 44 mov w5, 1 smaddl x3, w1, w3, x4 mov w4, 2 smull x2, w1, w2 add x1, x3, x1, sxtw 2 str w5, [x0, x2] str w4, [x0, x1] ret which doesn't 1:1 match either of yours.