https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 29 Jan 2018, ktkachov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067 > > --- Comment #3 from ktkachov at gcc dot gnu.org --- > (In reply to Richard Biener from comment #2) > > So any hint on whether the code after r257077 is better or worse than > > before? > > Looks worse unfortunately: > For aarch64 at -O2 it generates: > foo: > mov w3, 44 > mov w2, 40 > mov w5, 1 > mov w4, 2 > smull x3, w1, w3 > smull x2, w1, w2 > str w5, [x0, x3] > add x2, x2, 400 > add x1, x2, x1, sxtw 2 > str w4, [x0, x1] > ret > > whereas with r257077 it generates the shorter: > foo: > mov w3, 40 > sxtw x2, w1 > mov w4, 1 > smaddl x0, w1, w3, x0 > mov w3, 2 > add x1, x0, x2, lsl 2 > str w4, [x0, x2, lsl 2] > str w3, [x1, 400] > ret So shorter is worse? Might be because I don't understand the difference between the 'lsl 2' and the 'sxtw 2' or the cost of the [x1, 400] addressing.