Hi all,

This patch reimplements the vshrn_n* intrinsics to use RTL builtins.
These perform a narrowing right shift.

Although the intrinsic generates the half-width mode (e.g. V8HI -> V8QI), the 
new pattern
generates a full 128-bit mode (V8HI -> V16QI) by representing the 
fill-with-zeroes semantics
of the SHRN instruction. The narrower (V8QI) result is extracted with a lowpart 
subreg.
I found this allows the RTL optimisers to do a better job at optimising 
redundant moves away
in frequently-occurring SHRN+SRHN2 pairs, like in:
uint8x16_t
foo (uint16x8_t in1, uint16x8_t in2)
{
  uint8x8_t tmp = vshrn_n_u16 (in2, 7);
  uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4);
  return tmp2;
}

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

        * config/aarch64/aarch64-simd-builtins.def (shrn): Define builtin.
        * config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Define.
        (aarch64_shrn<mode>_insn_be): Likewise.
        (aarch64_shrn<mode>): Likewise.
        * config/aarch64/arm_neon.h (vshrn_n_s16): Reimplement using builtins.
        (vshrn_n_s32): Likewise.
        (vshrn_n_s64): Likewise.
        (vshrn_n_u16): Likewise.
        (vshrn_n_u32): Likewise.
        (vshrn_n_u64): Likewise.
        * config/aarch64/iterators.md (vn_mode): New mode attribute.

Attachment: vshrn.patch
Description: vshrn.patch

Reply via email to