On Wed, Feb 12, 2020 at 9:56 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Wilco Dijkstra <wilco.dijks...@arm.com> writes:
> > Hi Richard,
> >
> > Right, so this is an alternative approach using costs - Combine won't try to
> > duplicate instructions if it increases costs, so increasing the ctz cost to 
> > 2
> > instructions (which is the correct cost for ctz anyway)
>
> ...agreed...

Yes I agree a better cost model for CTZ/CLZ is the right solution but
I disagree with 2 ALU instruction as the cost.  It should either be
the same cost as a multiply or have its own cost entry.
For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4
cycles, the same as the cost as a multiple; on OcteonTX2 it is 5
cycles (again the same cost as a multiple).

Thanks,
Andrew Pinski

>
> > ensures we still get efficient code for this example:
> >
> > [AArch64] Set ctz rtx_cost (PR93565)
> >
> > Although GCC should understand the limited range of clz/ctz/cls results,
> > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> > sign extension.  Avoid this by setting the cost for ctz to be higher than
> > that of a simple ALU instruction.  Deepsjeng performance improves by ~0.6%.
> >
> > Bootstrap OK.
> >
> > ChangeLog:
> > 2020-02-12  Wilco Dijkstra  <wdijk...@arm.com>
> >
> > PR rtl-optimization/93565
> > * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs.
> >
> > * gcc.target/aarch64/pr93565.c: New test.
>
> OK, thanks.  Could you remove the bit about combine behaving oddly when
> you commit though?  I think this was simply a target bug and combine
> was being given duff information.
>
> Richard
>
> >
> > --
> >
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 
> > e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b
> >  100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> > outer ATTRIBUTE_UNUSED,
> >
> >        return false;
> >
> > +    case CTZ:
> > +      *cost = COSTS_N_INSNS (2);
> > +
> > +      if (speed)
> > +*cost += extra_cost->alu.clz + extra_cost->alu.rev;
> > +      return false;
> > +
> >      case COMPARE:
> >        op0 = XEXP (x, 0);
> >        op1 = XEXP (x, 1);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr93565.c
> > new file mode 100644
> > index 
> > 0000000000000000000000000000000000000000..7200f80d1bb161f6a058cc6591f61b6b75cf1749
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +static const unsigned long long magic = 0x03f08c5392f756cdULL;
> > +
> > +static const char table[64] = {
> > +     0,  1, 12,  2, 13, 22, 17,  3,
> > +    14, 33, 23, 36, 18, 58, 28,  4,
> > +    62, 15, 34, 26, 24, 48, 50, 37,
> > +    19, 55, 59, 52, 29, 44, 39,  5,
> > +    63, 11, 21, 16, 32, 35, 57, 27,
> > +    61, 25, 47, 49, 54, 51, 43, 38,
> > +    10, 20, 31, 56, 60, 46, 53, 42,
> > +     9, 30, 45, 41,  8, 40,  7,  6,
> > +};
> > +
> > +static inline int ctz1 (unsigned long  b)
> > +{
> > +  unsigned long lsb = b & -b;
> > +  return table[(lsb * magic) >> 58];
> > +}
> > +
> > +void f (unsigned long x, int *p)
> > +{
> > +  if (x != 0)
> > +    {
> > +      int a = ctz1 (x);
> > +      *p = a | p[a];
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "clz\t" 1 } } */
> > +

Reply via email to