On 27/09/18 18:07, Wilco Dijkstra wrote:
> The popcount expansion uses SIMD instructions acting on 64-bit values.
> As a result a popcount of a 32-bit integer requires zero-extension before 
> moving the zero-extended value into an FP register.  This patch adds
> support for zero-extended int->FP moves to avoid the redundant uxtw.
> Similarly, add support for a 32-bit zero-extending load->FP register.
> Add a missing 'fp' arch attribute to the related 8/16-bit pattern.
> 
> int f (int a)
> {
>   return __builtin_popcount (a);
> }
> 
> Before:
>       uxtw    x0, w0
>       fmov    d0, x0
>       cnt     v0.8b, v0.8b
>       addv    b0, v0.8b
>       fmov    w0, s0
>       ret
> 
> After:
>       fmov    s0, w0
>       cnt     v0.8b, v0.8b
>       addv    b0, v0.8b
>       fmov    w0, s0
>       ret
> 
> Passes regress on AArch64, OK for commit?
> 
> ChangeLog:
> 2018-09-27  Wilco Dijkstra  <wdijk...@arm.com>
> 
> gcc/
>       * config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Add w=r and
>       w=m zeroextend alternatives.

This is a bit too terse for my liking.  "Add alternatives to zero-extend
into a floating-point register." would be clearer.

>       (zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add arch attribute.
> 
> testsuite/
>       * gcc.target/aarch64/popcnt.c: Test zero-extended popcount.
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> ef2368706e88a551b9d0d2db2385860112bdbdde..c0b39ea876f76b54b31088d8d63d96a9cbcf8b88
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1588,13 +1588,16 @@
>  )
>  
>  (define_insn "*zero_extendsidi2_aarch64"
> -  [(set (match_operand:DI 0 "register_operand" "=r,r")
> -        (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "r,m")))]
> +  [(set (match_operand:DI 0 "register_operand" "=r,r,w,w")
> +        (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" 
> "r,m,r,m")))]
>    ""
>    "@
>     uxtw\t%0, %w1
> -   ldr\t%w0, %1"
> -  [(set_attr "type" "extend,load_4")]
> +   ldr\t%w0, %1
> +   fmov\t%s0, %w1
> +   ldr\t%s0, %1"
> +  [(set_attr "type" "extend,load_4,fmov,f_loads")
> +   (set_attr "arch" "*,*,fp,fp")]
>  )
>  
>  (define_insn "*load_pair_zero_extendsidi2_aarch64"
> @@ -1634,7 +1637,8 @@
>     and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask>
>     ldr<SHORT:size>\t%w0, %1
>     ldr\t%<SHORT:size>0, %1"
> -  [(set_attr "type" "logic_imm,load_4,load_4")]
> +  [(set_attr "type" "logic_imm,load_4,f_loads")

This change looks correct, but it's not part of your ChangeLog entry.

> +   (set_attr "arch" "*,*,fp")]
>  )
>  
>  (define_expand "<optab>qihi2"
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> index 
> 7e957966d8e81b8633a444bb42944d0da82ae5db..8fb8db82e052bd898aa8513710fd94f892d80b61
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/popcnt.c
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> @@ -19,5 +19,14 @@ foo2 (long long x)
>    return __builtin_popcountll (x);
>  }
>  
> -/* { dg-final { scan-assembler-not "popcount" } } */
> -/* { dg-final { scan-assembler-times "cnt\t" 3 } } */
> +int
> +foo3 (int *p)
> +{
> +  return __builtin_popcount (*p);
> +}
> +
> +/* { dg-final { scan-assembler-not {popcount} } } */
> +/* { dg-final { scan-assembler-times {cnt\t} 4 } } */
> +/* { dg-final { scan-assembler-times {fmov\ts} 1 } } */
> +/* { dg-final { scan-assembler-times {fmov\td} 2 } } */
> +/* { dg-final { scan-assembler-times {ldr\ts} 1 } } */
> 

OK with those changes.

R.

Reply via email to