Hi all,

In certain intrinsics use cases GCC leaves SETs of a bottom-element vec select 
lying around:
        (vec_select:DI (reg:V2DI 34 v2 [orig:128 __o ] [128])
            (parallel [
                    (const_int 0 [0])
                ])))

This can be treated as a simple move in aarch64 when done between SIMD 
registers for all normal widths.
These go through the aarch64_get_lane pattern.
This patch adds a splitter there to simplify these extracts to a move that can, 
perhaps, be optimised a way.
Another benefit is if the destination is memory we can use a simpler STR 
instruction rather than ST1-lane.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.

Thanks,
Kyrill

gcc/

        * config/aarch64/aarch64-simd.md (aarch64_get_lane<mode>): Convert to
        define_insn_and_split.  Split into simple move when moving bottom 
element.

gcc/testsuite/

        * gcc.target/aarch64/vdup_lane_2.c: Scan for fmov rather than dup.

Attachment: vec-split.patch
Description: vec-split.patch

Reply via email to