On Wed, May 16, 2018 at 10:37 AM Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com>
wrote:


> On 15/05/18 10:58, Richard Biener wrote:
> > On Tue, May 15, 2018 at 10:20 AM Kyrill Tkachov
> > <kyrylo.tkac...@foss.arm.com>
> > wrote:
> >
> >> Hi all,
> >> This is a respin of James's patch from:
> > https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> >> The original patch was approved and committed but was later reverted
> > because of failures on big-endian.
> >> This tweaked version fixes the big-endian failures in
> > aarch64_expand_vector_init by picking the right
> >> element of VALS to move into the low part of the vector register
> > depending on endianness. The rest of the patch
> >> stays the same. I'm looking for approval on the aarch64 parts, as they
> > are the ones that have changed
> >> since the last approved version of the patch.
> >> -----------------------------------------------------------------------
> >> In the testcase in this patch we create an SLP vector with only two
> >> elements. Our current vector initialisation code will first duplicate
> >> the first element to both lanes, then overwrite the top lane with a new
> >> value.
> >> This duplication can be clunky and wasteful.
> >> Better would be to simply use the fact that we will always be
> >> overwriting
> >> the remaining bits, and simply move the first element to the corrcet
> >> place
> >> (implicitly zeroing all other bits).
> >> This reduces the code generation for this case, and can allow more
> >> efficient addressing modes, and other second order benefits for AArch64
> >> code which has been vectorized to V2DI mode.
> >> Note that the change is generic enough to catch the case for any vector
> >> mode, but is expected to be most useful for 2x64-bit vectorization.
> >> Unfortunately, on its own, this would cause failures in
> >> gcc.target/aarch64/load_v2vec_lanes_1.c and
> >> gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
> >> vec_merge and vec_duplicate for their simplifications to apply. To fix
> >> this,
> >> add a special case to the AArch64 code if we are loading from two
memory
> >> addresses, and use the load_pair_lanes patterns directly.
> >> We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
> >> , to
> >> catch:
> >>      (vec_merge:OUTER
> >>         (vec_duplicate:OUTER x:INNER)
> >>         (subreg:OUTER y:INNER 0)
> >>         (const_int N))
> >> And simplify it to:
> >>      (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
> >> This is similar to the existing patterns which are tested in this
> >> function,
> >> without requiring the second operand to also be a vec_duplicate.
> >> Bootstrapped and tested on aarch64-none-linux-gnu and tested on
> >> aarch64-none-elf.
> >> Note that this requires
> >> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html
> >> if we don't want to ICE creating broken vector zero extends.
> >> Are the non-AArch64 parts OK?
> > Is (vec_merge (subreg ..) (vec_duplicate)) canonicalized to the form
> > you handle?  I see the (vec_merge (vec_duplicate...) (vec_concat)) case
> > also doesn't handle the swapped operand case.
> >
> > Otherwise the middle-end parts looks ok.

> I don't see any explicit canonicalisation code for it.
> I've updated the simplify-rtx part to handle the swapped operand case.
> Is the attached patch better in this regard? I couldn't think of a clean
way to avoid
> duplicating some logic (beyond creating a new function away from the
callsite).

Works for me.  Were you able to actually create such RTL from testcases?
Segher, do you know where canonicalization rules are documented?
IIRC we do not actively try to canonicalize in most cases.

Richard.

> Thanks,
> Kyrill

> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> James
> >> ---
> >> 2018-05-15  James Greenhalgh  <james.greenha...@arm.com>
> >>                Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> >>            * config/aarch64/aarch64.c (aarch64_expand_vector_init):
Modify
> >>            code generation for cases where splatting a value is not
useful.
> >>            * simplify-rtx.c (simplify_ternary_operation): Simplify
> >>            vec_merge across a vec_duplicate and a paradoxical subreg
> > forming a vector
> >>            mode to a vec_concat.
> >> 2018-05-15  James Greenhalgh  <james.greenha...@arm.com>
> >>            * gcc.target/aarch64/vect-slp-dup.c: New.

Reply via email to