Hi Richard, > -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: 25 September 2020 10:35 > To: gcc-patches@gcc.gnu.org > Cc: ni...@redhat.com; Richard Earnshaw <richard.earns...@arm.com>; > Ramana Radhakrishnan <ramana.radhakrish...@arm.com>; Kyrylo > Tkachov <kyrylo.tkac...@arm.com> > Subject: [PATCH] arm: Fix fp16 move patterns for base MVE > > This patch fixes ICEs in gcc.dg/torture/float16-basic.c for > -march=armv8.1-m.main+mve -mfloat-abi=hard. The problem was > that an fp16 argument was (rightly) being passed in FPRs, > but the fp16 move patterns only handled GPRs. LRA then cycled > trying to look for a way of handling the FPR. > > It looks like there are three related problems here: > > (1) We're using the wrong fp16 move pattern for base MVE. > *mov<mode>_vfp_<mode>16 (the pattern we use for +mve.fp) > works for base MVE too. > > (2) The fp16 MVE load and store patterns are separate from the > main move patterns. The loads and stores should instead be > alternatives of the main move patterns, so that LRA knows > what to do with pseudo registers that become stack slots. > > (3) The range restrictions for the loads and stores were wrong > for fp16: we were enforcing a multiple of 4 in [-255*4, 255*4] > instead of a multiple of 2 in [-255*2, 255*2]. > > (2) came from a patch to prevent writeback being used for MVE. > That patch also added a Uj constraint to enforce the correct > memory types for MVE. I think the simplest fix is therefore to merge > the loads and stores back into the main pattern and extend the Uj > constraint so that it acts like Um for non-MVE. > > The testcase for that patch was mve-vldstr16-no-writeback.c, whose > main function is: > > void > fn1 (__fp16 *pSrc) > { > __fp16 high; > __fp16 *pDst = 0; > unsigned i; > for (i = 0;; i++) > if (pSrc[i]) > pDst[i] = high; > } > > Fixing (2) causes the store part to fail, not because we're using > writeback, but because we decide to use GPRs to store high (which is > uninitialised, and so gets replaced with zero). This patch therefore > adds some scan-assembler-nots instead. (I wondered about changing the > testcase to initialise high, but that seemed like a bad idea for > a regression test.) > > For (3): MVE seems to be the only thing to use > arm_coproc_mem_operand_wb > (and its various interfaces) for 16-bit scalars: the Neon patterns only > use it for 32-bit scalars. > > I've added new tests to try the various FPR alternatives of the > move patterns. The range of offsets that GCC uses for FPR loads > and stores is the intersection of the range allowed for GPRs and > FPRs, so the tests include GPR<->memory tests as well. > > The fp32 and fp64 tests already pass, they're just there for > completeness. > > Tested on arm-eabi (MVE configuration), armeb-eabi (generic > configuration) and arm-linux-gnueabihf. OK to install?
Ok. Thanks for analysing these and fixing them. Kyrill > > Richard > > > gcc/ > * config/arm/arm-protos.h > (arm_mve_mode_and_operands_type_check): > Delete. > * config/arm/arm.c (arm_coproc_mem_operand_wb): Use a scale > factor > of 2 rather than 4 for 16-bit modes. > (arm_mve_mode_and_operands_type_check): Delete. > * config/arm/constraints.md (Uj): Allow writeback for Neon, > but continue to disallow it for MVE. > * config/arm/arm.md (*arm32_mov<HFBF:mode>): > Add !TARGET_HAVE_MVE. > * config/arm/vfp.md (*mov_load_vfp_hf16, *mov_store_vfp_hf16): > Fold > back into... > (*mov<mode>_vfp_<mode>16): ...here but use Uj for the FPR > memory > constraints. Use for base MVE too. > > gcc/testsuite/ > * gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: Allow > the store to use GPRs instead of FPRs. Add scan-assembler-nots > for writeback. > * gcc.target/arm/armv8_1m-fp16-move-1.c: New test. > * gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise. > * gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise. > --- > gcc/config/arm/arm-protos.h | 1 - > gcc/config/arm/arm.c | 25 +- > gcc/config/arm/arm.md | 4 +- > gcc/config/arm/constraints.md | 9 +- > gcc/config/arm/vfp.md | 32 +- > .../gcc.target/arm/armv8_1m-fp16-move-1.c | 418 +++++++++++++++++ > .../gcc.target/arm/armv8_1m-fp32-move-1.c | 420 +++++++++++++++++ > .../gcc.target/arm/armv8_1m-fp64-move-1.c | 426 ++++++++++++++++++ > .../intrinsics/mve-vldstr16-no-writeback.c | 5 +- > 9 files changed, 1295 insertions(+), 45 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move- > 1.c > create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move- > 1.c > create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move- > 1.c > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 0cc0ae78400..9bb9c61967b 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -120,7 +120,6 @@ extern int > arm_coproc_mem_operand_no_writeback (rtx); > extern int arm_coproc_mem_operand_wb (rtx, int); > extern int neon_vector_mem_operand (rtx, int, bool); > extern int mve_vector_mem_operand (machine_mode, rtx, bool); > -bool arm_mve_mode_and_operands_type_check (machine_mode, rtx, rtx); > extern int neon_struct_mem_operand (rtx); > > extern rtx *neon_vcmla_lane_prepare_operands (rtx *); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 022ef6c3f1d..8105b39e7a4 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -13277,14 +13277,18 @@ arm_coproc_mem_operand_wb (rtx op, int > wb_level) > > /* Match: > (plus (reg) > - (const)). */ > + (const)) > + > + The encoded immediate for 16-bit modes is multiplied by 2, > + while the encoded immediate for 32-bit and 64-bit modes is > + multiplied by 4. */ > + int factor = MIN (GET_MODE_SIZE (GET_MODE (op)), 4); > if (GET_CODE (ind) == PLUS > && REG_P (XEXP (ind, 0)) > && REG_MODE_OK_FOR_BASE_P (XEXP (ind, 0), VOIDmode) > && CONST_INT_P (XEXP (ind, 1)) > - && INTVAL (XEXP (ind, 1)) > -1024 > - && INTVAL (XEXP (ind, 1)) < 1024 > - && (INTVAL (XEXP (ind, 1)) & 3) == 0) > + && IN_RANGE (INTVAL (XEXP (ind, 1)), -255 * factor, 255 * factor) > + && (INTVAL (XEXP (ind, 1)) & (factor - 1)) == 0) > return TRUE; > > return FALSE; > @@ -33578,17 +33582,4 @@ arm_mode_base_reg_class (machine_mode > mode) > > struct gcc_target targetm = TARGET_INITIALIZER; > > -bool > -arm_mve_mode_and_operands_type_check (machine_mode mode, rtx op0, > rtx op1) > -{ > - if (!(TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)) > - return true; > - else if (mode == E_BFmode) > - return false; > - else if ((s_register_operand (op0, mode) && MEM_P (op1)) > - || (s_register_operand (op1, mode) && MEM_P (op0))) > - return false; > - return true; > -} > - > #include "gt-arm.h" > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index c4fa116ab77..147c4a50c72 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -7289,7 +7289,9 @@ (define_expand "mov<mode>" > (define_insn "*arm32_mov<mode>" > [(set (match_operand:HFBF 0 "nonimmediate_operand" "=r,m,r,r") > (match_operand:HFBF 1 "general_operand" " m,r,r,F"))] > - "TARGET_32BIT && !TARGET_HARD_FLOAT > + "TARGET_32BIT > + && !TARGET_HARD_FLOAT > + && !TARGET_HAVE_MVE > && ( s_register_operand (operands[0], <MODE>mode) > || s_register_operand (operands[1], <MODE>mode))" > "* > diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md > index ff229aa9845..789e3332abb 100644 > --- a/gcc/config/arm/constraints.md > +++ b/gcc/config/arm/constraints.md > @@ -454,10 +454,13 @@ (define_memory_constraint "Uv" > > (define_memory_constraint "Uj" > "@internal > - In ARM/Thumb-2 state an VFP load/store address which does not support > - writeback at all (eg vldr.16)." > + In ARM/Thumb-2 state a VFP load/store address that supports writeback > + for Neon but not for MVE" > (and (match_code "mem") > - (match_test "TARGET_32BIT && > arm_coproc_mem_operand_no_writeback (op)"))) > + (match_test "TARGET_32BIT") > + (match_test "TARGET_HAVE_MVE > + ? arm_coproc_mem_operand_no_writeback (op) > + : neon_vector_mem_operand (op, 2, true)"))) > > (define_memory_constraint "Uy" > "@internal > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > index 6a2bc5a789f..72707c17929 100644 > --- a/gcc/config/arm/vfp.md > +++ b/gcc/config/arm/vfp.md > @@ -387,31 +387,15 @@ (define_insn "*movdi_vfp" > (set_attr "arch" "t2,any,any,any,a,t2,any,any,any,any,any,any")] > ) > > -(define_insn "*mov_load_vfp_hf16" > - [(set (match_operand:HF 0 "s_register_operand" "=t") > - (match_operand:HF 1 "memory_operand" "Uj"))] > - "TARGET_HAVE_MVE_FLOAT" > - "vldr.16\\t%0, %E1" > -) > - > -(define_insn "*mov_store_vfp_hf16" > - [(set (match_operand:HF 0 "memory_operand" "=Uj") > - (match_operand:HF 1 "s_register_operand" "t"))] > - "TARGET_HAVE_MVE_FLOAT" > - "vstr.16\\t%1, %E0" > -) > - > ;; HFmode and BFmode moves > > (define_insn "*mov<mode>_vfp_<mode>16" > [(set (match_operand:HFBF 0 "nonimmediate_operand" > - "= ?r,?m,t,r,t,r,t, t, Um,r") > + "= ?r,?m,t,r,t,r,t, t, Uj,r") > (match_operand:HFBF 1 "general_operand" > - " m,r,t,r,r,t,Dv,Um,t, F"))] > + " m,r,t,r,r,t,Dv,Uj,t, F"))] > "TARGET_32BIT > - && TARGET_VFP_FP16INST > - && arm_mve_mode_and_operands_type_check (<MODE>mode, > operands[0], > - operands[1]) > + && (TARGET_VFP_FP16INST || TARGET_HAVE_MVE) > && (s_register_operand (operands[0], <MODE>mode) > || s_register_operand (operands[1], <MODE>mode))" > { > @@ -430,9 +414,15 @@ (define_insn "*mov<mode>_vfp_<mode>16" > case 6: /* S register from immediate. */ > return \"vmov.f16\\t%0, %1\t%@ __<fporbf>\"; > case 7: /* S register from memory. */ > - return \"vld1.16\\t{%z0}, %A1\"; > + if (TARGET_HAVE_MVE) > + return \"vldr.16\\t%0, %1\"; > + else > + return \"vld1.16\\t{%z0}, %A1\"; > case 8: /* Memory from S register. */ > - return \"vst1.16\\t{%z1}, %A0\"; > + if (TARGET_HAVE_MVE) > + return \"vstr.16\\t%1, %0\"; > + else > + return \"vst1.16\\t{%z1}, %A0\"; > case 9: /* ARM register from constant. */ > { > long bits; > diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move-1.c > b/gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move-1.c > new file mode 100644 > index 00000000000..67a9f416adf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move-1.c > @@ -0,0 +1,418 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -mfloat-abi=hard -mfp16-format=ieee" } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** r_w: > +** vmov.f16 r0, s0 @ __fp16 > +** bx lr > +*/ > +void > +r_w (_Float16 s0) > +{ > + register _Float16 r0 asm ("r0"); > + r0 = s0; > + asm volatile ("" :: "r" (r0)); > +} > + > +/* > +** w_r: > +** vmov.f16 s0, r0 @ __fp16 > +** bx lr > +*/ > +_Float16 > +w_r () > +{ > + register _Float16 r0 asm ("r0"); > + asm volatile ("" : "=r" (r0)); > + return r0; > +} > + > +/* > +** w_w: > +** vmov s1, s0 @ __fp16 > +** bx lr > +*/ > +void > +w_w (_Float16 s0) > +{ > + register _Float16 s1 asm ("s1"); > + s1 = s0; > + asm volatile ("" :: "w" (s1)); > +} > + > +/* > +** r_m_m128: > +** sub (r[0-9]+), r0, #256 > +** ldrh r1, \[\1\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_m128 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[-128]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_m127: > +** ldrh r1, \[r0, #-254\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_m127 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[-127]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_m1: > +** ldrh r1, \[r0, #-2\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_m1 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[-1]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_0: > +** ldrh r1, \[r0\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_0 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[0]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_1: > +** ldrh r1, \[r0, #2\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_1 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[1]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_255: > +** ldrh r1, \[r0, #510\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_255 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[255]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_256: > +** ldrh r1, \[r0, #512\] @ __fp16 > +** bx lr > +*/ > +void > +r_m_256 (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + r1 = r0[256]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* ??? This could be done in one instruction, but without mve.fp, > + it makes more sense for memory_operand to enforce the GPR range. */ > +/* > +** w_m_m128: > +** sub (r[0-9]+), r0, #256 > +** vldr.16 s0, \[\1\] > +** bx lr > +*/ > +void > +w_m_m128 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[-128]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_m127: > +** vldr.16 s0, \[r0, #-254\] > +** bx lr > +*/ > +void > +w_m_m127 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[-127]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_m1: > +** vldr.16 s0, \[r0, #-2\] > +** bx lr > +*/ > +void > +w_m_m1 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[-1]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_0: > +** vldr.16 s0, \[r0\] > +** bx lr > +*/ > +void > +w_m_0 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[0]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_1: > +** vldr.16 s0, \[r0, #2\] > +** bx lr > +*/ > +void > +w_m_1 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[1]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_255: > +** vldr.16 s0, \[r0, #510\] > +** bx lr > +*/ > +void > +w_m_255 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[255]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_256: > +** add (r[0-9]+), r0, #512 > +** vldr.16 s0, \[\1\] > +** bx lr > +*/ > +void > +w_m_256 (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + s0 = r0[256]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** m_m128_r: > +** sub (r[0-9]+), r0, #256 > +** strh r1, \[\1\] @ __fp16 > +** bx lr > +*/ > +void > +m_m128_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[-128] = r1; > +} > + > +/* > +** m_m127_r: > +** strh r1, \[r0, #-254\] @ __fp16 > +** bx lr > +*/ > +void > +m_m127_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[-127] = r1; > +} > + > +/* > +** m_m1_r: > +** strh r1, \[r0, #-2\] @ __fp16 > +** bx lr > +*/ > +void > +m_m1_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[-1] = r1; > +} > + > +/* > +** m_0_r: > +** strh r1, \[r0\] @ __fp16 > +** bx lr > +*/ > +void > +m_0_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[0] = r1; > +} > + > +/* > +** m_1_r: > +** strh r1, \[r0, #2\] @ __fp16 > +** bx lr > +*/ > +void > +m_1_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[1] = r1; > +} > + > +/* > +** m_255_r: > +** strh r1, \[r0, #510\] @ __fp16 > +** bx lr > +*/ > +void > +m_255_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[255] = r1; > +} > + > +/* > +** m_256_r: > +** strh r1, \[r0, #512\] @ __fp16 > +** bx lr > +*/ > +void > +m_256_r (_Float16 *r0) > +{ > + register _Float16 r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[256] = r1; > +} > + > +/* ??? This could be done in one instruction, but without mve.fp, > + it makes more sense for memory_operand to enforce the GPR range. */ > +/* > +** m_m128_w: > +** sub (r[0-9]+), r0, #256 > +** vstr.16 s0, \[\1\] > +** bx lr > +*/ > +void > +m_m128_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[-128] = s0; > +} > + > +/* > +** m_m127_w: > +** vstr.16 s0, \[r0, #-254\] > +** bx lr > +*/ > +void > +m_m127_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[-127] = s0; > +} > + > +/* > +** m_m1_w: > +** vstr.16 s0, \[r0, #-2\] > +** bx lr > +*/ > +void > +m_m1_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[-1] = s0; > +} > + > +/* > +** m_0_w: > +** vstr.16 s0, \[r0\] > +** bx lr > +*/ > +void > +m_0_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[0] = s0; > +} > + > +/* > +** m_1_w: > +** vstr.16 s0, \[r0, #2\] > +** bx lr > +*/ > +void > +m_1_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[1] = s0; > +} > + > +/* > +** m_255_w: > +** vstr.16 s0, \[r0, #510\] > +** bx lr > +*/ > +void > +m_255_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[255] = s0; > +} > + > +/* > +** m_256_w: > +** add (r[0-9]+), r0, #512 > +** vstr.16 s0, \[\1\] > +** bx lr > +*/ > +void > +m_256_w (_Float16 *r0) > +{ > + register _Float16 s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[256] = s0; > +} > diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move-1.c > b/gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move-1.c > new file mode 100644 > index 00000000000..1ecb839bfe7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move-1.c > @@ -0,0 +1,420 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -mfloat-abi=hard" } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** r_w: > +** vmov r0, s0 > +** bx lr > +*/ > +void > +r_w (float s0) > +{ > + register float r0 asm ("r0"); > + r0 = s0; > + asm volatile ("" :: "r" (r0)); > +} > + > +/* > +** w_r: > +** vmov s0, r0 > +** bx lr > +*/ > +float > +w_r () > +{ > + register float r0 asm ("r0"); > + asm volatile ("" : "=r" (r0)); > + return r0; > +} > + > +/* > +** w_w: > +** vmov.f32 s1, s0 > +** bx lr > +*/ > +void > +w_w (float s0) > +{ > + register float s1 asm ("s1"); > + s1 = s0; > + asm volatile ("" :: "w" (s1)); > +} > + > +/* > +** r_m_m64: > +** sub (r[0-9]+), r0, #256 > +** ldr r1, \[\1\] @ float > +** bx lr > +*/ > +void > +r_m_m64 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[-64]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_m63: > +** ldr r1, \[r0, #-252\] @ float > +** bx lr > +*/ > +void > +r_m_m63 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[-63]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_m1: > +** ldr r1, \[r0, #-4\] @ float > +** bx lr > +*/ > +void > +r_m_m1 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[-1]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_0: > +** ldr r1, \[r0\] @ float > +** bx lr > +*/ > +void > +r_m_0 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[0]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_1: > +** ldr r1, \[r0, #4\] @ float > +** bx lr > +*/ > +void > +r_m_1 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[1]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_255: > +** ldr r1, \[r0, #1020\] @ float > +** bx lr > +*/ > +void > +r_m_255 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[255]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* > +** r_m_256: > +** add (r[0-9]+), r0, #1024 > +** ldr r1, \[r0\] @ float > +** bx lr > +*/ > +void > +r_m_256 (float *r0) > +{ > + register float r1 asm ("r1"); > + r1 = r0[256]; > + asm volatile ("" :: "r" (r1)); > +} > + > +/* ??? This could be done in one instruction, but without mve.fp, > + it makes more sense for memory_operand to enforce the GPR range. */ > +/* > +** w_m_m64: > +** sub (r[0-9]+), r0, #256 > +** vldr.32 s0, \[\1\] > +** bx lr > +*/ > +void > +w_m_m64 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[-64]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_m63: > +** vldr.32 s0, \[r0, #-252\] > +** bx lr > +*/ > +void > +w_m_m63 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[-63]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_m1: > +** vldr.32 s0, \[r0, #-4\] > +** bx lr > +*/ > +void > +w_m_m1 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[-1]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_0: > +** vldr.32 s0, \[r0\] > +** bx lr > +*/ > +void > +w_m_0 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[0]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_1: > +** vldr.32 s0, \[r0, #4\] > +** bx lr > +*/ > +void > +w_m_1 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[1]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_255: > +** vldr.32 s0, \[r0, #1020\] > +** bx lr > +*/ > +void > +w_m_255 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[255]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** w_m_256: > +** add (r[0-9]+), r0, #1024 > +** vldr.32 s0, \[\1\] > +** bx lr > +*/ > +void > +w_m_256 (float *r0) > +{ > + register float s0 asm ("s0"); > + s0 = r0[256]; > + asm volatile ("" :: "w" (s0)); > +} > + > +/* > +** m_m64_r: > +** sub (r[0-9]+), r0, #256 > +** str r1, \[\1\] @ float > +** bx lr > +*/ > +void > +m_m64_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[-64] = r1; > +} > + > +/* > +** m_m63_r: > +** str r1, \[r0, #-252\] @ float > +** bx lr > +*/ > +void > +m_m63_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[-63] = r1; > +} > + > +/* > +** m_m1_r: > +** str r1, \[r0, #-4\] @ float > +** bx lr > +*/ > +void > +m_m1_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[-1] = r1; > +} > + > +/* > +** m_0_r: > +** str r1, \[r0\] @ float > +** bx lr > +*/ > +void > +m_0_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[0] = r1; > +} > + > +/* > +** m_1_r: > +** str r1, \[r0, #4\] @ float > +** bx lr > +*/ > +void > +m_1_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[1] = r1; > +} > + > +/* > +** m_255_r: > +** str r1, \[r0, #1020\] @ float > +** bx lr > +*/ > +void > +m_255_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[255] = r1; > +} > + > +/* > +** m_256_r: > +** add (r[0-9]+), r0, #1024 > +** str r1, \[r0\] @ float > +** bx lr > +*/ > +void > +m_256_r (float *r0) > +{ > + register float r1 asm ("r1"); > + asm volatile ("" : "=r" (r1)); > + r0[256] = r1; > +} > + > +/* ??? This could be done in one instruction, but without mve.fp, > + it makes more sense for memory_operand to enforce the GPR range. */ > +/* > +** m_m64_w: > +** sub (r[0-9]+), r0, #256 > +** vstr.32 s0, \[\1\] > +** bx lr > +*/ > +void > +m_m64_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[-64] = s0; > +} > + > +/* > +** m_m63_w: > +** vstr.32 s0, \[r0, #-252\] > +** bx lr > +*/ > +void > +m_m63_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[-63] = s0; > +} > + > +/* > +** m_m1_w: > +** vstr.32 s0, \[r0, #-4\] > +** bx lr > +*/ > +void > +m_m1_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[-1] = s0; > +} > + > +/* > +** m_0_w: > +** vstr.32 s0, \[r0\] > +** bx lr > +*/ > +void > +m_0_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[0] = s0; > +} > + > +/* > +** m_1_w: > +** vstr.32 s0, \[r0, #4\] > +** bx lr > +*/ > +void > +m_1_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[1] = s0; > +} > + > +/* > +** m_255_w: > +** vstr.32 s0, \[r0, #1020\] > +** bx lr > +*/ > +void > +m_255_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[255] = s0; > +} > + > +/* > +** m_256_w: > +** add (r[0-9]+), r0, #1024 > +** vstr.32 s0, \[\1\] > +** bx lr > +*/ > +void > +m_256_w (float *r0) > +{ > + register float s0 asm ("s0"); > + asm volatile ("" : "=w" (s0)); > + r0[256] = s0; > +} > diff --git a/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > new file mode 100644 > index 00000000000..3f81350697a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c > @@ -0,0 +1,426 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -mfloat-abi=hard" } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** r_w: > +** vmov r0, r1, d0 > +** bx lr > +*/ > +void > +r_w (double d0) > +{ > + register double r0 asm ("r0"); > + r0 = d0; > + asm volatile ("" :: "r" (r0)); > +} > + > +/* > +** w_r: > +** vmov d0, r0, r1 > +** bx lr > +*/ > +double > +w_r () > +{ > + register double r0 asm ("r0"); > + asm volatile ("" : "=r" (r0)); > + return r0; > +} > + > +/* > +** w_w: > +** ( > +** vmov.f32 s2, s0 > +** vmov.f32 s3, s1 > +** | > +** vmov.f32 s3, s1 > +** vmov.f32 s2, s0 > +** ) > +** bx lr > +*/ > +void > +w_w (double d0) > +{ > + register double d1 asm ("d1"); > + d1 = d0; > + asm volatile ("" :: "w" (d1)); > +} > + > +/* > +** r_m_m32: > +** sub (r[0-9]+), r0, #256 > +** ldrd r2, \[\1\] > +** bx lr > +*/ > +void > +r_m_m32 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[-32]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* > +** r_m_m31: > +** ldrd r2, \[r0, #-248\] > +** bx lr > +*/ > +void > +r_m_m31 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[-31]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* > +** r_m_m1: > +** ldrd r2, \[r0, #-8\] > +** bx lr > +*/ > +void > +r_m_m1 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[-1]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* > +** r_m_0: > +** ldrd r2, \[r0\] > +** bx lr > +*/ > +void > +r_m_0 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[0]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* > +** r_m_1: > +** ldrd r2, \[r0, #8\] > +** bx lr > +*/ > +void > +r_m_1 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[1]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* > +** r_m_127: > +** ldrd r2, \[r0, #1016\] > +** bx lr > +*/ > +void > +r_m_127 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[127]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* > +** r_m_128: > +** add (r[0-9]+), r0, #1024 > +** ldrd r2, \[r0\] > +** bx lr > +*/ > +void > +r_m_128 (double *r0) > +{ > + register double r2 asm ("r2"); > + r2 = r0[128]; > + asm volatile ("" :: "r" (r2)); > +} > + > +/* ??? This could be done in one instruction, but without mve.fp, > + it makes more sense for memory_operand to enforce the GPR range. */ > +/* > +** w_m_m32: > +** sub (r[0-9]+), r0, #256 > +** vldr.64 d0, \[\1\] > +** bx lr > +*/ > +void > +w_m_m32 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[-32]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** w_m_m31: > +** vldr.64 d0, \[r0, #-248\] > +** bx lr > +*/ > +void > +w_m_m31 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[-31]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** w_m_m1: > +** vldr.64 d0, \[r0, #-8\] > +** bx lr > +*/ > +void > +w_m_m1 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[-1]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** w_m_0: > +** vldr.64 d0, \[r0\] > +** bx lr > +*/ > +void > +w_m_0 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[0]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** w_m_1: > +** vldr.64 d0, \[r0, #8\] > +** bx lr > +*/ > +void > +w_m_1 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[1]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** w_m_127: > +** vldr.64 d0, \[r0, #1016\] > +** bx lr > +*/ > +void > +w_m_127 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[127]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** w_m_128: > +** add (r[0-9]+), r0, #1024 > +** vldr.64 d0, \[\1\] > +** bx lr > +*/ > +void > +w_m_128 (double *r0) > +{ > + register double d0 asm ("d0"); > + d0 = r0[128]; > + asm volatile ("" :: "w" (d0)); > +} > + > +/* > +** m_m32_r: > +** sub (r[0-9]+), r0, #256 > +** strd r2, \[\1\] > +** bx lr > +*/ > +void > +m_m32_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[-32] = r2; > +} > + > +/* > +** m_m31_r: > +** strd r2, \[r0, #-248\] > +** bx lr > +*/ > +void > +m_m31_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[-31] = r2; > +} > + > +/* > +** m_m1_r: > +** strd r2, \[r0, #-8\] > +** bx lr > +*/ > +void > +m_m1_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[-1] = r2; > +} > + > +/* > +** m_0_r: > +** strd r2, \[r0\] > +** bx lr > +*/ > +void > +m_0_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[0] = r2; > +} > + > +/* > +** m_1_r: > +** strd r2, \[r0, #8\] > +** bx lr > +*/ > +void > +m_1_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[1] = r2; > +} > + > +/* > +** m_127_r: > +** strd r2, \[r0, #1016\] > +** bx lr > +*/ > +void > +m_127_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[127] = r2; > +} > + > +/* > +** m_128_r: > +** add (r[0-9]+), r0, #1024 > +** strd r2, \[r0\] > +** bx lr > +*/ > +void > +m_128_r (double *r0) > +{ > + register double r2 asm ("r2"); > + asm volatile ("" : "=r" (r2)); > + r0[128] = r2; > +} > + > +/* ??? This could be done in one instruction, but without mve.fp, > + it makes more sense for memory_operand to enforce the GPR range. */ > +/* > +** m_m32_w: > +** sub (r[0-9]+), r0, #256 > +** vstr.64 d0, \[\1\] > +** bx lr > +*/ > +void > +m_m32_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[-32] = d0; > +} > + > +/* > +** m_m31_w: > +** vstr.64 d0, \[r0, #-248\] > +** bx lr > +*/ > +void > +m_m31_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[-31] = d0; > +} > + > +/* > +** m_m1_w: > +** vstr.64 d0, \[r0, #-8\] > +** bx lr > +*/ > +void > +m_m1_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[-1] = d0; > +} > + > +/* > +** m_0_w: > +** vstr.64 d0, \[r0\] > +** bx lr > +*/ > +void > +m_0_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[0] = d0; > +} > + > +/* > +** m_1_w: > +** vstr.64 d0, \[r0, #8\] > +** bx lr > +*/ > +void > +m_1_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[1] = d0; > +} > + > +/* > +** m_127_w: > +** vstr.64 d0, \[r0, #1016\] > +** bx lr > +*/ > +void > +m_127_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[127] = d0; > +} > + > +/* > +** m_128_w: > +** add (r[0-9]+), r0, #1024 > +** vstr.64 d0, \[\1\] > +** bx lr > +*/ > +void > +m_128_w (double *r0) > +{ > + register double d0 asm ("d0"); > + asm volatile ("" : "=w" (d0)); > + r0[128] = d0; > +} > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no- > writeback.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no- > writeback.c > index 0a69aced8b4..50b195300d8 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no- > writeback.c > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no- > writeback.c > @@ -13,5 +13,6 @@ fn1 (__fp16 *pSrc) > pDst[i] = high; > } > > -/* { dg-final { scan-assembler {vldr\.16\ts[0-9]+, \[r[0-9]+\]\n} } } */ > -/* { dg-final { scan-assembler {vstr\.16\ts[0-9]+, \[r[0-9]+\]\n} } } */ > +/* { dg-final { scan-assembler {vldr\.16\ts[0-9]+, \[r[0-9]+(, #-?[0- > 9]+)?\]\n} } } */ > +/* { dg-final { scan-assembler-not {vldr\.16\t[^\n]*\]!} } } */ > +/* { dg-final { scan-assembler-not {vstr\.16\t[^\n]*\]!} } } */