On Fri, May 10, 2019 at 10:32:22AM +0100, Kyrill Tkachov wrote: > Hi all, > > This patch fixes the failing gcc.dg/vect/slp-reduc-sad-2.c testcase on > aarch64 > by implementing a vec_init optab that can handle two half-width vectors > producing a full-width one > by concatenating them. > > In the gcc.dg/vect/slp-reduc-sad-2.c case it's a V8QI reg concatenated > with a V8QI const_vector of zeroes. > This can be implemented efficiently using the aarch64_combinez pattern > that just loads a D-register to make > use of the implicit zero-extending semantics of that load. > Otherwise it concatenates the two vector using aarch64_simd_combine. > > With this patch I'm seeing the effect from richi's original patch that > added gcc.dg/vect/slp-reduc-sad-2.c on aarch64 > and 525.x264_r improves by about 1.5%. > > Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on > aarch64_be-none-elf. > > Ok for trunk?
I have a question on the patch. Otherise, this is OK for trunk. > 2019-10-05 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR tree-optimization/90332 > * config/aarch64/aarch64.c (aarch64_expand_vector_init): > Handle VALS containing two vectors. > * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Rename > to... > (@aarch64_combinez<mode>): ... This. > (*aarch64_combinez_be<mode>): Rename to... > (@aarch64_combinez_be<mode>): ... This. > (vec_init<mode><Vhalf>): New define_expand. > * config/aarch64/iterators.md (Vhalf): Handle V8HF. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 0c2c17ed8269923723d066b250974ee1ff423d26..52c933cfdac20c5c566c13ae2528f039efda4c46 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -15075,6 +15075,43 @@ aarch64_expand_vector_init (rtx target, rtx vals) > rtx v0 = XVECEXP (vals, 0, 0); > bool all_same = true; > > + /* This is a special vec_init<M><N> where N is not an element mode but a > + vector mode with half the elements of M. We expect to find two entries > + of mode N in VALS and we must put their concatentation into TARGET. */ > + if (XVECLEN (vals, 0) == 2 && VECTOR_MODE_P (GET_MODE (XVECEXP (vals, 0, > 0)))) Should you validate the two vector modes are actually half-size vectors here, and not something unexpected? Thanks, James > + { > + rtx lo = XVECEXP (vals, 0, 0); > + rtx hi = XVECEXP (vals, 0, 1); > + machine_mode narrow_mode = GET_MODE (lo); > + gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode); > + gcc_assert (narrow_mode == GET_MODE (hi)); > + > + /* When we want to concatenate a half-width vector with zeroes we can > + use the aarch64_combinez[_be] patterns. Just make sure that the > + zeroes are in the right half. */ > + if (BYTES_BIG_ENDIAN > + && aarch64_simd_imm_zero (lo, narrow_mode) > + && general_operand (hi, narrow_mode)) > + emit_insn (gen_aarch64_combinez_be (narrow_mode, target, hi, lo)); > + else if (!BYTES_BIG_ENDIAN > + && aarch64_simd_imm_zero (hi, narrow_mode) > + && general_operand (lo, narrow_mode)) > + emit_insn (gen_aarch64_combinez (narrow_mode, target, lo, hi)); > + else > + { > + /* Else create the two half-width registers and combine them. */ > + if (!REG_P (lo)) > + lo = force_reg (GET_MODE (lo), lo); > + if (!REG_P (hi)) > + hi = force_reg (GET_MODE (hi), hi); > + > + if (BYTES_BIG_ENDIAN) > + std::swap (lo, hi); > + emit_insn (gen_aarch64_simd_combine (narrow_mode, target, lo, hi)); > + } > + return; > + } > + > /* Count the number of variable elements to initialise. */ > for (int i = 0; i < n_elts; ++i) > {