On 06/06/2017 02:35 AM, Kyrill Tkachov wrote: > Hi all, > > Another vec_merge simplification that's missing is transforming: > (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N)) > into > (vec_concat x z) if N == 1 (0b01) or > (vec_concat y x) if N == 2 (0b10) > > For the testcase in this patch on aarch64 this allows us to try matching > during combine the pattern: > (set (reg:V2DI 78 [ x ]) > (vec_concat:V2DI > (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]) > (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) > (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + > 8B]+0 S8 A64]))) > > rather than the more complex: > (set (reg:V2DI 78 [ x ]) > (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 > [ y ]) > (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) > + 8B]+0 S8 A64])) > (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 > S8 A64])) > (const_int 2 [0x2]))) > > We don't actually have an aarch64 pattern for the simplified version > above, but it's a simple enough > form to add, so this patch adds such a pattern that performs a > concatenated load of two 64-bit vectors > in adjacent memory locations as a single Q-register LDR. The new aarch64 > pattern is needed to demonstrate > the effectiveness of the simplify-rtx change, so I've kept them together > as one patch. > > Now for the testcase in the patch we can generate: > construct_lanedi: > ldr q0, [x0] > ret > > construct_lanedf: > ldr q0, [x0] > ret > > instead of: > construct_lanedi: > ld1r {v0.2d}, [x0] > ldr x0, [x0, 8] > ins v0.d[1], x0 > ret > > construct_lanedf: > ld1r {v0.2d}, [x0] > ldr d1, [x0, 8] > ins v0.d[1], v1.d[0] > ret > > The new memory constraint Utq is needed because we need to allow only > the Q-register addressing modes but > the MEM expressions in the RTL pattern have 64-bit vector modes, and if > we don't constrain them they will > allow the D-register addressing modes during register allocation/address > mode selection, which will produce > invalid assembly. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Ok for trunk? > > Thanks, > Kyrill > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): > Simplify vec_merge of vec_duplicate and vec_concat. > * config/aarch64/constraints.md (Utq): New constraint. > * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New > define_insn. > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * gcc.target/aarch64/load_v2vec_lanes_1.c: New test. OK for the simplify-rtx bits.
jeff