https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935
--- Comment #5 from Xi Ruoyao <xry111 at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #4) > /* Default RTX cost initializer. */ > ... > int_mult_si (COSTS_N_INSNS (1)), > int_mult_di (COSTS_N_INSNS (1)), > > > That seems wrong. > I suspect you will get other improvements when you touch this. > > E.g. > ``` > int f(int t) > { > return t * 17; > } > ``` > Should really be: > shift followed by an add. > But currently is just a mult. > > What is interesting is I think -Os cost is the opposite from the -O2 cost ... > > That is -Os produces the better code generation due to the cost for mult > being set to 4: > /* RTX costs to use when optimizing for size. */ > ... > .int_mult_si_ (4) > .int_mult_di_ (4) 4 is just COSTS_N_INSNS(1), so in -Os we are making all instructions cost the same. This should be correct because in -Os we should minimize the number of the instructions. In loongarch_rtx_costs though we have: case MULT: if (float_mode_p) *total = loongarch_fp_mult_cost (mode); else if (mode == DImode && !TARGET_64BIT) *total = (speed ? loongarch_cost->int_mult_si * 3 + 6 : COSTS_N_INSNS (7)); else if (!speed) *total = COSTS_N_INSNS (1) + 1; else if (mode == DImode) *total = loongarch_cost->int_mult_di; else *total = loongarch_cost->int_mult_si; return false; so we still slightly penalty multiplication. To me we should code COSTS_N_INSNS (1) + 1 into loongarch_rtx_cost_optimize_size instead of special casing it in loongarch_rtx_costs. For the default value (used when -O2) I'll do some micro-benchmark...