[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

xry111 at gcc dot gnu.org via Gcc-bugs Fri, 08 Dec 2023 22:56:16 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935


--- Comment #5 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
> /* Default RTX cost initializer.  */
> ...
>     int_mult_si (COSTS_N_INSNS (1)),
>     int_mult_di (COSTS_N_INSNS (1)),
> 
> 
> That seems wrong.
> I suspect you will get other improvements when you touch this.
> 
> E.g.
> ```
> int f(int t)
> {
>         return t * 17;
> }
> ```
> Should really be:
> shift followed by an add.
> But currently is just a mult.
> 
> What is interesting is I think -Os cost is the opposite from the -O2 cost ...
> 
> That is -Os produces the better code generation due to the cost for mult
> being set to 4:
> /* RTX costs to use when optimizing for size.  */
> ...
>     .int_mult_si_ (4)
>     .int_mult_di_ (4)

4 is just COSTS_N_INSNS(1), so in -Os we are making all instructions cost the
same.  This should be correct because in -Os we should minimize the number of
the instructions.  In loongarch_rtx_costs though we have:

    case MULT: 
      if (float_mode_p)
        *total = loongarch_fp_mult_cost (mode);
      else if (mode == DImode && !TARGET_64BIT)
        *total = (speed
                  ? loongarch_cost->int_mult_si * 3 + 6 
                  : COSTS_N_INSNS (7)); 
      else if (!speed)
        *total = COSTS_N_INSNS (1) + 1;
      else if (mode == DImode)
        *total = loongarch_cost->int_mult_di;
      else  
        *total = loongarch_cost->int_mult_si;
      return false;

so we still slightly penalty multiplication.  To me we should code
COSTS_N_INSNS (1) + 1 into loongarch_rtx_cost_optimize_size instead of special
casing it in loongarch_rtx_costs.

For the default value (used when -O2) I'll do some micro-benchmark...

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

Reply via email to