On Wed, Feb 12, 2020 at 9:56 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Wilco Dijkstra <wilco.dijks...@arm.com> writes: > > Hi Richard, > > > > Right, so this is an alternative approach using costs - Combine won't try to > > duplicate instructions if it increases costs, so increasing the ctz cost to > > 2 > > instructions (which is the correct cost for ctz anyway) > > ...agreed...
Yes I agree a better cost model for CTZ/CLZ is the right solution but I disagree with 2 ALU instruction as the cost. It should either be the same cost as a multiply or have its own cost entry. For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4 cycles, the same as the cost as a multiple; on OcteonTX2 it is 5 cycles (again the same cost as a multiple). Thanks, Andrew Pinski > > > ensures we still get efficient code for this example: > > > > [AArch64] Set ctz rtx_cost (PR93565) > > > > Although GCC should understand the limited range of clz/ctz/cls results, > > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary > > sign extension. Avoid this by setting the cost for ctz to be higher than > > that of a simple ALU instruction. Deepsjeng performance improves by ~0.6%. > > > > Bootstrap OK. > > > > ChangeLog: > > 2020-02-12 Wilco Dijkstra <wdijk...@arm.com> > > > > PR rtl-optimization/93565 > > * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs. > > > > * gcc.target/aarch64/pr93565.c: New test. > > OK, thanks. Could you remove the bit about combine behaving oddly when > you commit though? I think this was simply a target bug and combine > was being given duff information. > > Richard > > > > > -- > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index > > e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b > > 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int > > outer ATTRIBUTE_UNUSED, > > > > return false; > > > > + case CTZ: > > + *cost = COSTS_N_INSNS (2); > > + > > + if (speed) > > +*cost += extra_cost->alu.clz + extra_cost->alu.rev; > > + return false; > > + > > case COMPARE: > > op0 = XEXP (x, 0); > > op1 = XEXP (x, 1); > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c > > b/gcc/testsuite/gcc.target/aarch64/pr93565.c > > new file mode 100644 > > index > > 0000000000000000000000000000000000000000..7200f80d1bb161f6a058cc6591f61b6b75cf1749 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c > > @@ -0,0 +1,34 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +static const unsigned long long magic = 0x03f08c5392f756cdULL; > > + > > +static const char table[64] = { > > + 0, 1, 12, 2, 13, 22, 17, 3, > > + 14, 33, 23, 36, 18, 58, 28, 4, > > + 62, 15, 34, 26, 24, 48, 50, 37, > > + 19, 55, 59, 52, 29, 44, 39, 5, > > + 63, 11, 21, 16, 32, 35, 57, 27, > > + 61, 25, 47, 49, 54, 51, 43, 38, > > + 10, 20, 31, 56, 60, 46, 53, 42, > > + 9, 30, 45, 41, 8, 40, 7, 6, > > +}; > > + > > +static inline int ctz1 (unsigned long b) > > +{ > > + unsigned long lsb = b & -b; > > + return table[(lsb * magic) >> 58]; > > +} > > + > > +void f (unsigned long x, int *p) > > +{ > > + if (x != 0) > > + { > > + int a = ctz1 (x); > > + *p = a | p[a]; > > + } > > +} > > + > > +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */ > > +/* { dg-final { scan-assembler-times "clz\t" 1 } } */ > > +