On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote:
> Hi all, > This is a respin of James's patch from: https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html > The original patch was approved and committed but was later reverted because of failures on big-endian. > This tweaked version fixes the big-endian failures in aarch64_expand_vector_init by picking the right > element of VALS to move into the low part of the vector register depending on endianness. The rest of the patch > stays the same. I'm looking for approval on the aarch64 parts, as they are the ones that have changed > since the last approved version of the patch. > ----------------------------------------------------------------------- > In the testcase in this patch we create an SLP vector with only two > elements. Our current vector initialisation code will first duplicate > the first element to both lanes, then overwrite the top lane with a new > value. > This duplication can be clunky and wasteful. > Better would be to simply use the fact that we will always be > overwriting > the remaining bits, and simply move the first element to the corrcet > place > (implicitly zeroing all other bits). > This reduces the code generation for this case, and can allow more > efficient addressing modes, and other second order benefits for AArch64 > code which has been vectorized to V2DI mode. > Note that the change is generic enough to catch the case for any vector > mode, but is expected to be most useful for 2x64-bit vectorization. > Unfortunately, on its own, this would cause failures in > gcc.target/aarch64/load_v2vec_lanes_1.c and > gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more > vec_merge and vec_duplicate for their simplifications to apply. To fix > this, > add a special case to the AArch64 code if we are loading from two memory > addresses, and use the load_pair_lanes patterns directly. > We also need a new pattern in simplify-rtx.c:simplify_ternary_operation > , to > catch: > (vec_merge:OUTER > (vec_duplicate:OUTER x:INNER) > (subreg:OUTER y:INNER 0) > (const_int N)) > And simplify it to: > (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) > This is similar to the existing patterns which are tested in this > function, > without requiring the second operand to also be a vec_duplicate. > Bootstrapped and tested on aarch64-none-linux-gnu and tested on > aarch64-none-elf. > Note that this requires > https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html > if we don't want to ICE creating broken vector zero extends. > Are the non-AArch64 parts OK? Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form you handle? I see the (vec_merge (vec_duplicate...) (vec_concat)) case also doesn't handle the swapped operand case. Otherwise the middle-end parts looks ok. Thanks, Richard. > Thanks, > James > --- > 2018-05-15 James Greenhalgh <james.greenha...@arm.com> > Kyrylo Tkachov <kyrylo.tkac...@arm.com> > * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify > code generation for cases where splatting a value is not useful. > * simplify-rtx.c (simplify_ternary_operation): Simplify > vec_merge across a vec_duplicate and a paradoxical subreg forming a vector > mode to a vec_concat. > 2018-05-15 James Greenhalgh <james.greenha...@arm.com> > * gcc.target/aarch64/vect-slp-dup.c: New.