Przemyslaw Wirkus <przemyslaw.wir...@arm.com> writes:
> Hi all,
>
> Vectorise __builtin_signbit (v2sf, v4sf) with unsigned shift right vector
> instruction.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Assembly output for:
> $ aarch64-elf-gcc -S -O3 signbitv2sf.c -dp
>
> Before patch:
>
> foo:
>       ldp     w2, w1, [x1]    // 37   [c=0 l=4]  
> *load_pair_zero_extendsidi2_aarch64/0
>       and     w2, w2, -2147483648     // 8    [c=4 l=4]  andsi3/1
>       and     w1, w1, -2147483648     // 12   [c=4 l=4]  andsi3/1
>       stp     w2, w1, [x0]    // 38   [c=0 l=4]  store_pair_sw_sisi/0
>       ret             // 32   [c=0 l=4]  *do_return
>
> After patch:
>
> foo:
>       ldr     d0, [x1]        // 7    [c=8 l=4]  *aarch64_simd_movv2sf/0
>       ushr    v0.2s, v0.2s, 31        // 8    [c=12 l=4]  
> aarch64_simd_lshrv2si
>       str     d0, [x0]        // 9    [c=4 l=4]  *aarch64_simd_movv2si/2
>       ret             // 28   [c=0 l=4]  *do_return
>
> Assembly output for:
> $ aarch64-elf-gcc -S -O3 signbitv4sf.c -dp
>
> Before patch:
>
> foo:
>       adrp    x3, in  // 38   [c=4 l=4]  *movdi_aarch64/12
>       adrp    x2, out // 41   [c=4 l=4]  *movdi_aarch64/12
>       add     x3, x3, :lo12:in        // 40   [c=4 l=4]  add_losym_di
>       add     x2, x2, :lo12:out       // 43   [c=4 l=4]  add_losym_di
>       mov     x0, 0   // 3    [c=4 l=4]  *movdi_aarch64/3
>       .p2align 3,,7
> .L2:
>       ldr     w1, [x3, x0]    // 10   [c=16 l=4]  *zero_extendsidi2_aarch64/1
>       and     w1, w1, -2147483648     // 11   [c=4 l=4]  andsi3/1
>       str     w1, [x2, x0]    // 16   [c=4 l=4]  *movsi_aarch64/8
>       add     x0, x0, 4       // 17   [c=4 l=4]  *adddi3_aarch64/0
>       cmp     x0, 4096        // 19   [c=4 l=4]  cmpdi/1
>       bne     .L2             // 20   [c=4 l=4]  condjump
>       ret             // 51   [c=0 l=4]  \*do_return
>
> After patch:
>
> foo:
>       adrp    x2, in  // 37   [c=4 l=4]  *movdi_aarch64/12
>       adrp    x1, out // 40   [c=4 l=4]  *movdi_aarch64/12
>       add     x2, x2, :lo12:in        // 39   [c=4 l=4]  add_losym_di
>       add     x1, x1, :lo12:out       // 42   [c=4 l=4]  add_losym_di
>       mov     x0, 0   // 3    [c=4 l=4]  *movdi_aarch64/3
>       .p2align 3,,7
> .L2:
>       ldr     q0, [x2, x0]    // 10   [c=8 l=4]  *aarch64_simd_movv4sf/0
>       ushr    v0.4s, v0.4s, 31        // 11   [c=12 l=4]  
> aarch64_simd_lshrv4si
>       str     q0, [x1, x0]    // 15   [c=4 l=4]  *aarch64_simd_movv4si/2
>       add     x0, x0, 16      // 16   [c=4 l=4]  *adddi3_aarch64/0
>       cmp     x0, 4096        // 18   [c=4 l=4]  cmpdi/1
>       bne     .L2             // 19   [c=4 l=4]  condjump
>       ret             // 50   [c=0 l=4]  *do_return
>
> OK for Trunk ?
>
> Thanks,
> Przemyslaw
>
> gcc/ChangeLog:
>
> 2019-05-13  Przemyslaw Wirkus  <przemyslaw.wir...@arm.com\>
>
>       * internal-fn.def (SIGNBIT): New.
>       * config/aarch64/aarch64-simd.md (signbitv2sf2): New expand
>       defined.
>       (signbitv4sf2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2019-05-13  Przemyslaw Wirkus  <przemyslaw.wir...@arm.com\>
>
>       * gcc.target/aarch64/signbitv4sf.c: New test.
>       * gcc.target/aarch64/signbitv2sf.c: New test.

Thanks, applied as r271149.

Richard

Reply via email to