Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

Christophe Lyon Wed, 15 Nov 2017 08:29:15 -0800

On 15 November 2017 at 16:58, Kyrill  Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:
> Hi Christophe,
>
>
> On 15/11/17 15:31, Christophe Lyon wrote:
>>
>> Hi Kyrill,
>>
>>
>> On 8 November 2017 at 19:34, Kyrill  Tkachov
>> <kyrylo.tkac...@foss.arm.com> wrote:
>>>
>>> On 06/06/17 14:17, James Greenhalgh wrote:
>>>>
>>>> On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> On top of the previous vec_merge simplifications [1] we can add this
>>>>> pattern to perform
>>>>> a store of a vec_concat of two 64-bit values in distinct registers as
>>>>> an
>>>>> STP.
>>>>> This avoids constructing such a vector explicitly in a register and
>>>>> storing it as
>>>>> a Q register.
>>>>> This way for the code in the testcase we can generate:
>>>>>
>>>>> construct_lane_1:
>>>>>           ldp     d1, d0, [x0]
>>>>>           fmov    d3, 1.0e+0
>>>>>           fmov    d2, 2.0e+0
>>>>>           fadd    d4, d1, d3
>>>>>           fadd    d5, d0, d2
>>>>>           stp     d4, d5, [x1, 32]
>>>>>           ret
>>>>>
>>>>> construct_lane_2:
>>>>>           ldp     x2, x0, [x0]
>>>>>           add     x3, x2, 1
>>>>>           add     x4, x0, 2
>>>>>           stp     x3, x4, [x1, 32]
>>>>>           ret
>>>>>
>>>>> instead of the current:
>>>>> construct_lane_1:
>>>>>           ldp     d0, d1, [x0]
>>>>>           fmov    d3, 1.0e+0
>>>>>           fmov    d2, 2.0e+0
>>>>>           fadd    d0, d0, d3
>>>>>           fadd    d1, d1, d2
>>>>>           dup     v0.2d, v0.d[0]
>>>>>           ins     v0.d[1], v1.d[0]
>>>>>           str     q0, [x1, 32]
>>>>>           ret
>>>>>
>>>>> construct_lane_2:
>>>>>           ldp     x2, x3, [x0]
>>>>>           add     x0, x2, 1
>>>>>           add     x2, x3, 2
>>>>>           dup     v0.2d, x0
>>>>>           ins     v0.d[1], x2
>>>>>           str     q0, [x1, 32]
>>>>>           ret
>>>>>
>>>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>>> Ok for GCC 8?
>>>>
>>>> OK.
>>>
>>>
>>> Thanks, I've committed this and the other patches in this series after
>>> rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
>>> The only conflict from updating the patch was that I had to use the
>>> store_16
>>> attribute rather than
>>> the old store2 for the new define_insn. This is what I've committed with
>>> r254551.
>>>
>>> Sorry for the delay in committing.
>>>
>> I've noticed that the new tests fail when testing with -mabi=ilp32:
>> FAIL:    gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t
>> FAIL:    gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
>> stp\td[0-9]+, d[0-9]+ 1 (found 0 times)
>> FAIL:    gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
>> stp\tx[0-9]+, x[0-9]+ 1 (found 0 times)
>>
>> Sorry if this has been reported before.
>
>
> Thank you for reporting this, I was not aware of it.
> My patch does indeed fail to generate the optimised sequence for
> -mabi=ilp32.
> During combine it fails to match:
> Failed to match this instruction:
> (set (mem:V2DF (plus:DI (reg/v/f:DI 79 [ z ])
>             (const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 A128])
>     (vec_concat:V2DF (reg:DF 81 [ y0 ])
>         (reg:DF 84 [ y1 ])))
>
>
> but without the -mabi=ilp32 it does successfully match the equivalent
>
> (set (mem:V2DF (plus:DI (reg:DI 1 x1 [ z ])
>             (const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 A128])
>     (vec_concat:V2DF (reg:DF 81 [ y0 ])
>         (reg:DF 84 [ y1 ])))
>
> The only difference is the index register being the hard reg x1.
> There's probably some subtlety in aarch64_classify_address that I'll need to
> dig into.
> In any case, can you please open a bug report for this so we can track it?


Sure, that's: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009


> To be clear, the failure is just suboptimal codegen for the -mabi=ilp32
> case, not a wrong-code or ICE
> (though it should still be fixed, of course).
>
> Thanks again,
> Kyrill
>
>
>> Christophe
>>
>>> Kyrill
>>>
>>>
>>>> Thanks,
>>>> James
>>>>
>>>>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>>>
>>>>>       * config/aarch64/aarch64-simd.md (store_pair_lanes<mode>):
>>>>>       New pattern.
>>>>>       * config/aarch64/constraints.md (Uml): New constraint.
>>>>>       * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand):
>>>>> New
>>>>>       predicate.
>>>>>
>>>>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>>>
>>>>>       * gcc.target/aarch64/store_v2vec_lanes.c: New test.
>>>>
>>>>
>

Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

Reply via email to