On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com>
wrote:

> Hi all,

> This is a respin of James's patch from:
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> The original patch was approved and committed but was later reverted
because of failures on big-endian.
> This tweaked version fixes the big-endian failures in
aarch64_expand_vector_init by picking the right
> element of VALS to move into the low part of the vector register
depending on endianness. The rest of the patch
> stays the same. I'm looking for approval on the aarch64 parts, as they
are the ones that have changed
> since the last approved version of the patch.

> -----------------------------------------------------------------------

> In the testcase in this patch we create an SLP vector with only two
> elements. Our current vector initialisation code will first duplicate
> the first element to both lanes, then overwrite the top lane with a new
> value.

> This duplication can be clunky and wasteful.

> Better would be to simply use the fact that we will always be
> overwriting
> the remaining bits, and simply move the first element to the corrcet
> place
> (implicitly zeroing all other bits).

> This reduces the code generation for this case, and can allow more
> efficient addressing modes, and other second order benefits for AArch64
> code which has been vectorized to V2DI mode.

> Note that the change is generic enough to catch the case for any vector
> mode, but is expected to be most useful for 2x64-bit vectorization.

> Unfortunately, on its own, this would cause failures in
> gcc.target/aarch64/load_v2vec_lanes_1.c and
> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> vec_merge and vec_duplicate for their simplifications to apply. To fix
> this,
> add a special case to the AArch64 code if we are loading from two memory
> addresses, and use the load_pair_lanes patterns directly.

> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
> , to
> catch:

>     (vec_merge:OUTER
>        (vec_duplicate:OUTER x:INNER)
>        (subreg:OUTER y:INNER 0)
>        (const_int N))

> And simplify it to:

>     (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)

> This is similar to the existing patterns which are tested in this
> function,
> without requiring the second operand to also be a vec_duplicate.

> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> aarch64-none-elf.

> Note that this requires
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> if we don't want to ICE creating broken vector zero extends.

> Are the non-AArch64 parts OK?

Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
also doesn't handle the swapped operand case.

Otherwise the middle-end parts looks ok.

Thanks,
Richard.

> Thanks,
> James

> ---
> 2018-05-15  James Greenhalgh  <james.greenha...@arm.com>
>               Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

>           * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
>           code generation for cases where splatting a value is not useful.
>           * simplify-rtx.c (simplify_ternary_operation): Simplify
>           vec_merge across a vec_duplicate and a paradoxical subreg
forming a vector
>           mode to a vec_concat.

> 2018-05-15  James Greenhalgh  <james.greenha...@arm.com>

>           * gcc.target/aarch64/vect-slp-dup.c: New.

Reply via email to