https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125387

            Bug ID: 125387
           Summary: riscv: smuldi3_highpart cost too high
           Product: gcc
           Version: 17.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: anton at ozlabs dot org
  Target Milestone: ---

I was debugging a situation where integer divide by a constant wasn't getting
converted to multiplication by the reciprocal shifted left, and the result
shifted right trick. An example:

#include <stdint.h>

int64_t foo(int64_t x)
{
        return x / 30;
}

# gcc -O2 -march=rv64gcv -mtune=tt-ascalon-d8 div.c -S -dp

foo:
        li      a5,30           # 6     [c=4 l=4]  *movdi_64bit/1
        div     a0,a0,a5        # 12    [c=52 l=4]  divdi3
        ret             # 25    [c=0 l=4]  simple_return

If I remove the Ascalon tune, we see the expected behaviour:

# gcc -O2 -march=rv64gcv div.c -S -dp

foo:
        li      a4,-2004316160          # 32    [c=4 l=4]  *movdi_64bit/1
        addi    a4,a4,-1911     # 33    [c=4 l=4]  *adddi3/1
        slli    a5,a4,32        # 23    [c=4 l=4]  ashldi3
        add     a5,a5,a4        # 24    [c=4 l=4]  *adddi3/0
        mulh    a5,a0,a5        # 8     [c=88 l=4]  smuldi3_highpart
        srai    a4,a0,63        # 11    [c=4 l=4]  ashrdi3
        add     a5,a5,a0        # 9     [c=4 l=4]  *adddi3/0
        srai    a5,a5,4 # 10    [c=4 l=4]  ashrdi3
        sub     a0,a5,a4        # 17    [c=4 l=4]  subdi3
        ret             # 36    [c=0 l=4]  simple_return

The Ascalon integer divide cost is lower than the default tune, but it's high
enough that the above code should be determined to be quicker. The
smuldi3_highpart just ends up being a mulh but notice the cost is very high
(88). 

After stumbling around the RISC-V rtx cost code, I think we are adding the cost
of a lot of instructions (shifts, multiply, sign extensions etc). I think this
needs to be fixed in riscv_rtx_costs (it should just be the cost of 1 integer
multiply), but I'm fast getting out of my depth.

(insn 8 24 11 (set (reg:DI 15 a5 [137])
        (truncate:DI (lshiftrt:TI (mult:TI (sign_extend:TI (reg:DI 10 a0
[orig:143 shiftby ] [143]))
                    (sign_extend:TI (reg:DI 15 a5 [138])))
                (const_int 64 [0x40])))) "div.c":5:17 28 {smuldi3_highpart}
     (expr_list:REG_EQUAL (truncate:DI (lshiftrt:TI (mult:TI (sign_extend:TI
(reg:DI 10 a0 [orig:143 shiftby ] [143]))
                    (const_int -8608480567731124087 [0x8888888888888889]))
                (const_int 64 [0x40])))
        (nil)))

Reply via email to