On 15 November 2017 at 16:58, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Hi Christophe, > > > On 15/11/17 15:31, Christophe Lyon wrote: >> >> Hi Kyrill, >> >> >> On 8 November 2017 at 19:34, Kyrill Tkachov >> <kyrylo.tkac...@foss.arm.com> wrote: >>> >>> On 06/06/17 14:17, James Greenhalgh wrote: >>>> >>>> On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote: >>>>> >>>>> Hi all, >>>>> >>>>> On top of the previous vec_merge simplifications [1] we can add this >>>>> pattern to perform >>>>> a store of a vec_concat of two 64-bit values in distinct registers as >>>>> an >>>>> STP. >>>>> This avoids constructing such a vector explicitly in a register and >>>>> storing it as >>>>> a Q register. >>>>> This way for the code in the testcase we can generate: >>>>> >>>>> construct_lane_1: >>>>> ldp d1, d0, [x0] >>>>> fmov d3, 1.0e+0 >>>>> fmov d2, 2.0e+0 >>>>> fadd d4, d1, d3 >>>>> fadd d5, d0, d2 >>>>> stp d4, d5, [x1, 32] >>>>> ret >>>>> >>>>> construct_lane_2: >>>>> ldp x2, x0, [x0] >>>>> add x3, x2, 1 >>>>> add x4, x0, 2 >>>>> stp x3, x4, [x1, 32] >>>>> ret >>>>> >>>>> instead of the current: >>>>> construct_lane_1: >>>>> ldp d0, d1, [x0] >>>>> fmov d3, 1.0e+0 >>>>> fmov d2, 2.0e+0 >>>>> fadd d0, d0, d3 >>>>> fadd d1, d1, d2 >>>>> dup v0.2d, v0.d[0] >>>>> ins v0.d[1], v1.d[0] >>>>> str q0, [x1, 32] >>>>> ret >>>>> >>>>> construct_lane_2: >>>>> ldp x2, x3, [x0] >>>>> add x0, x2, 1 >>>>> add x2, x3, 2 >>>>> dup v0.2d, x0 >>>>> ins v0.d[1], x2 >>>>> str q0, [x1, 32] >>>>> ret >>>>> >>>>> Bootstrapped and tested on aarch64-none-linux-gnu. >>>>> Ok for GCC 8? >>>> >>>> OK. >>> >>> >>> Thanks, I've committed this and the other patches in this series after >>> rebasing and rebootstrapping and testing on aarch64-none-linux-gnu. >>> The only conflict from updating the patch was that I had to use the >>> store_16 >>> attribute rather than >>> the old store2 for the new define_insn. This is what I've committed with >>> r254551. >>> >>> Sorry for the delay in committing. >>> >> I've noticed that the new tests fail when testing with -mabi=ilp32: >> FAIL: gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t >> FAIL: gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times >> stp\td[0-9]+, d[0-9]+ 1 (found 0 times) >> FAIL: gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times >> stp\tx[0-9]+, x[0-9]+ 1 (found 0 times) >> >> Sorry if this has been reported before. > > > Thank you for reporting this, I was not aware of it. > My patch does indeed fail to generate the optimised sequence for > -mabi=ilp32. > During combine it fails to match: > Failed to match this instruction: > (set (mem:V2DF (plus:DI (reg/v/f:DI 79 [ z ]) > (const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 A128]) > (vec_concat:V2DF (reg:DF 81 [ y0 ]) > (reg:DF 84 [ y1 ]))) > > > but without the -mabi=ilp32 it does successfully match the equivalent > > (set (mem:V2DF (plus:DI (reg:DI 1 x1 [ z ]) > (const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 A128]) > (vec_concat:V2DF (reg:DF 81 [ y0 ]) > (reg:DF 84 [ y1 ]))) > > The only difference is the index register being the hard reg x1. > There's probably some subtlety in aarch64_classify_address that I'll need to > dig into. > In any case, can you please open a bug report for this so we can track it?
Sure, that's: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009 > To be clear, the failure is just suboptimal codegen for the -mabi=ilp32 > case, not a wrong-code or ICE > (though it should still be fixed, of course). > > Thanks again, > Kyrill > > >> Christophe >> >>> Kyrill >>> >>> >>>> Thanks, >>>> James >>>> >>>>> 2017-06-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>>> >>>>> * config/aarch64/aarch64-simd.md (store_pair_lanes<mode>): >>>>> New pattern. >>>>> * config/aarch64/constraints.md (Uml): New constraint. >>>>> * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): >>>>> New >>>>> predicate. >>>>> >>>>> 2017-06-06 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>>> >>>>> * gcc.target/aarch64/store_v2vec_lanes.c: New test. >>>> >>>> >