On 20/01/17 14:08, Ramana Radhakrishnan wrote:
> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists)
> <andre.simoesdiasvie...@arm.com> wrote:
>> On 29/11/16 09:45, Andre Vieira (lists) wrote:
>>> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>>> <andre.simoesdiasvie...@arm.com> wrote:
>>>>> Hello,
>>>>>
>>>>> This patch tackles the issue reported in PR71607. This patch takes a
>>>>> different approach for disabling the creation of literal pools. Instead
>>>>> of disabling the patterns that would normally transform the rtl into
>>>>> actual literal pools, it disables the creation of this literal pool rtl
>>>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>>>>> arm_disable_literal_pool is true. I added patterns to split floating
>>>>> point constants for both SF and DFmode. A pattern to handle the
>>>>> addressing of label_refs had to be included as well since all
>>>>> "memory_operand" patterns are disabled when
>>>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>>>>> splitting 32-bit immediates had to be changed, it was not accepting
>>>>> unsigned 32-bit unsigned integers with the MSB set. I believe
>>>>> const_int_operand expects the mode of the operand to be set to VOIDmode
>>>>> and not SImode. I have only changed it in the patterns that were
>>>>> affecting this code, though I suggest looking into changing it in the
>>>>> rest of the ARM backend.
>>>>>
>>>>> I added more test cases. No regressions for arm-none-eabi with
>>>>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>>>>
>>>>> Is this OK for trunk?
>>>>
>>>> Including -mslow-flash-data in your multilib flags ? If no regressions
>>>> with that ok .
>>>>
>>>>
>>>> regards
>>>> Ramana
>>>>
>>>>>
>>>
>>> Hello,
>>>
>>> I found some new ICE's with the -mslow-flash-data testing so I had to
>>> rework this patch. I took the opportunity to rebase it as well.
>>>
>>> The problem was with the way the old version of the patch handled label
>>> references.  After some digging I found I wasn't using the right target
>>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
>>> ARM.  This target hook determines whether a literal pool ends up in an
>>> 'object_block' structure. So I reverted the changes made in the old
>>> version of the patch to the ARM implementation of the
>>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
>>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
>>> implementation for this hook that returns false if
>>> 'arm_disable_literal_pool' is set to true and true otherwise.
>>>
>>> This version of the patch also reverts back to using the check for
>>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
>>> last version, this code is required to place the label references in
>>> rodata sections.
>>>
>>> Another thing this patch does is revert the changes made to the 32-bit
>>> constant split in arm.md. The reason this was needed before was because
>>> 'real_to_target' returns a long array and does not sign-extend values in
>>> it, which would make sense on hosts with 64-bit longs. To fix this the
>>> value is now casted to 'int' first.  It would probably be a good idea to
>>> change the 'real_to_target' function to return an array with
>>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
>>> sign-extend them.  Something for the future?
>>>
>>> I added more test cases in this patch and reran regression tests for:
>>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
>>> bootstrap+regressions on arm-none-linux-gnueabihf.
>>>
>>> Is this OK for trunk?
>>>
>>> Cheers,
>>> Andre
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvie...@arm.com>
>>>
>>>     PR target/71607
>>>     * config/arm/arm.md (use_literal_pool): Removes.
>>>     (64-bit immediate split): No longer takes cost into consideration
>>>     if 'arm_disable_literal_pool' is enabled.
>>>     * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
>>>     (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
>>>     (arm_max_const_double_inline_cost): Remove use of
>>> arm_disable_literal_pool.
>>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>>     (no_literal_pool_sf_immediate): New.
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvie...@arm.com>
>>>             Thomas Preud'homme  <thomas.preudho...@arm.com>
>>>
>>>     PR target/71607
>>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
>>>
>> Hello,
>>
>> As I have mentioned in my commit message for the fix on embedded-6  (see
>> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an
>> issue with this code when testing its backport on the embedded-6-branch.
>>
>> I misread the definition of the target hook
>> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the way I implemented it
>> before only changed code generation if the -mslow-flash-data option
>> wasn't passed. I.e. I don't need to implement it to solve this issue.
>> The other changes in this patch are sufficient...
>>
>> I reran regressions for arm-none-eabi-gcc with the following targets:
>> Cortex-M0, Cortex-M3, Cortex-M7 and ARMv8-M Baseline. I also ran
>> bootstraps and full regressions for arm-none-linux-gnueabihf both in ARM
>> and THUMB mode. All green!
> 
> And presumably with -mslow-flash-data as a multilib again ?
> 
>>
>> Is this OK for trunk?
> 
> We should also be adding to arm_reorg
> 
> 
> if (arm_disable_literal_pool)
>   return ;
> 
> after we split all the insns.
> 
> 
> After all if the splitters added had done their job, there's no reason
> to run the minipool placement code at all - is there  :) ?
> 
> Furthermore is it guaranteed that we will not have any references to
> the literal pool post reload - IIRC , reload in its older days could
> easily just push any old constant into a standard constant pool entry
> and make things work . Is LRA likely not to produce any constant pool
> entries in this form ? The bad thing here would be producing literal
> pool entries when the user had asked the compiler to do so - thus
> having more belts and braces for this would be good (thus , why is a
> change to arm_cannot_force_const_mem to return false in the face of
> arm_disable_literal_pool not appropriate ? ) Nick also had a couple of
> good points on making the test more usable across all testers in
> private conversation - so please make those changes.
> 
> 
> 
> 
> regards
> Ramana
> 
> 
>>
>> Cheers,
>> Andre
>>
>> gcc/ChangeLog:
>>
>> 2016-12-28  Andre Vieira  <andre.simoesdiasvie...@arm.com>
>>
>>     PR target/71607
>>     * config/arm/arm.md (use_literal_pool): Removes.
>>     (64-bit immediate split): No longer takes cost into consideration
>>     if 'arm_disable_literal_pool' is enabled.
>>     * config/arm/arm.c (arm_max_const_double_inline_cost): Remove use
>>     of arm_disable_literal_pool.
>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>     (no_literal_pool_sf_immediate): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-12-28  Andre Vieira  <andre.simoesdiasvie...@arm.com>
>>             Thomas Preud'homme  <thomas.preudho...@arm.com>
>>
>>     PR target/71607
>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
Hello,

I forgot to mention I ran the full regression tests for -mcpu=cortex-m3
with the -mslow-flash-data option, this showed up clean.

After I disabled the creation of literal pools in arm_reorg if
'arm_disable_literal' pool is true the same test showed regressions with
TLS tests. The compiler generates assembly that tries to load directly
from a gott table symbol which is invalid assembly.  We do not have the
required AArch32 TLS relocations for instructions.

However, it might be worth mentioning that the GNU ARM Embedded
Toolchain is configured to use TLS emulation and this code is thus not
generated for toolchains configured with --disable-threads and
--disable-tls.  This wrong code generation would thus only occur in the
"niche" case of configuring a toolchain with threads and TLS, for a
M-profile device whilst compiling code containing TLS variables using
-mslow-flash-data.

As for your question whether a change to 'arm_cannot_force_const_mem' to
return false in the face of 'arm_disable_literal_pool'.  I think you
meant to say 'true' here.  I had tried this before, but this leads to
arm_legitimate_constant_p always returning false, which prevents the use
of constants altogether, even when synthesized into movw and movt's.

The way we currently use the target hooks around literal pools should
perhaps be reviewed, but that is a much bigger task. For instance, we
should perhaps be using 'TARGET_USE_BLOCKS_FOR_CONSTANT_P', which reads:
"This hook should return true if pool entries for constant x can be
placed in an object_block structure.". We currently use the default
implementation for this hook which returns false.

I reran regressions for arm-none-eabi-gcc with the following targets:
Cortex-M3, Cortex-M4 and Cortex-M7. I did not reran bootstraps as the
change in this new version can only affect M-profile targets and could
never change code-generation for other targets. Some of these tests are
still running, I will not commit anything before they are done. Though I
don't expect anything to change, since the -mslow-flash-data regression
testing, see bellow, worked fine.

I also reran the -mslow-flash-data for arm-none-eabi with Cortex-M3, and
as I mentioned earlier depending on how GCC is configured it could
regress for TLS tests.

Is this OK for trunk?

Cheers,
Andre
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
bbf10f23987e58a1e066715e2168772da0245ff1..ab4a9059c000417037d0a9c70aebc6987ddae235
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16373,10 +16373,6 @@ push_minipool_fix (rtx_insn *insn, HOST_WIDE_INT 
address, rtx *loc,
 int
 arm_max_const_double_inline_cost ()
 {
-  /* Let the value get synthesized to avoid the use of literal pools.  */
-  if (arm_disable_literal_pool)
-    return 99;
-
   return ((optimize_size || arm_ld_sched) ? 3 : 4);
 }
 
@@ -17323,6 +17319,11 @@ arm_reorg (void)
   if (!optimize)
     split_all_insns_noflow ();
 
+  /* Make sure we do not attempt to create a literal pool even though it should
+     no longer be necessary to create any.  */
+  if (arm_disable_literal_pool)
+    return ;
+
   minipool_fix_head = minipool_fix_tail = NULL;
 
   /* The first insn must always be a note, or the code below won't
@@ -30887,5 +30888,4 @@ arm_expand_divmod_libfunc (rtx libfunc, machine_mode 
mode,
   *quot_p = quotient;
   *rem_p = remainder;
 }
-
 #include "gt-arm.h"
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
ff1f565b850d1b9e7378461c5a808e7d486936bc..7d16e412765ccff4c527871b6b5d6142aa088b56
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -233,10 +233,6 @@
               (match_test "arm_restrict_it"))
          (const_string "no")
 
-         (and (eq_attr "use_literal_pool" "yes")
-              (match_test "arm_disable_literal_pool"))
-         (const_string "no")
-
          (eq_attr "arch_enabled" "no")
          (const_string "no")]
         (const_string "yes")))
@@ -5871,8 +5867,9 @@
        (match_operand:ANY64 1 "immediate_operand" ""))]
   "TARGET_32BIT
    && reload_completed
-   && (arm_const_double_inline_cost (operands[1])
-       <= arm_max_const_double_inline_cost ())"
+   && (arm_disable_literal_pool
+       || (arm_const_double_inline_cost (operands[1])
+          <= arm_max_const_double_inline_cost ()))"
   [(const_int 0)]
   "
   arm_split_constant (SET, SImode, curr_insn,
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
f83dc9b130e4288faa49e40ee8285a0cd7d325b1..b8db34a607e4fac75789fa6a7c09b3b6104f45d3
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -2079,3 +2079,40 @@
 ;; fmdhr et al (VFPv1)
 ;; Support for xD (single precision only) variants.
 ;; fmrrs, fmsrr
+
+;; Split an immediate DF move to two immediate SI moves.
+(define_insn_and_split "no_literal_pool_df_immediate"
+  [(set (match_operand 0 "s_register_operand" "")
+       (match_operand:DF 1 "const_double_operand" ""))]
+  "TARGET_THUMB2 && arm_disable_literal_pool
+  && !(TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE
+       && vfp3_const_double_rtx (operands[1]))"
+  "#"
+  "&& !reload_completed"
+  [(set (subreg:SI (match_dup 1) 0) (match_dup 2))
+   (set (subreg:SI (match_dup 1) 4) (match_dup 3))
+   (set (match_dup 0) (match_dup 1))]
+  "
+  long buf[2];
+  real_to_target (buf, CONST_DOUBLE_REAL_VALUE (operands[1]), DFmode);
+  operands[2] = GEN_INT ((int) buf[0]);
+  operands[3] = GEN_INT ((int) buf[1]);
+  operands[1] = gen_reg_rtx (DFmode);
+  ")
+
+;; Split an immediate SF move to one immediate SI move.
+(define_insn_and_split "no_literal_pool_sf_immediate"
+  [(set (match_operand 0 "s_register_operand" "")
+       (match_operand:SF 1 "const_double_operand" ""))]
+  "TARGET_THUMB2 && arm_disable_literal_pool
+  && !(TARGET_HARD_FLOAT && vfp3_const_double_rtx (operands[1]))"
+  "#"
+  "&& !reload_completed"
+  [(set (subreg:SI (match_dup 1) 0) (match_dup 2))
+   (set (match_dup 0) (match_dup 1))]
+  "
+  long buf;
+  real_to_target (&buf, CONST_DOUBLE_REAL_VALUE (operands[1]), SFmode);
+  operands[2] = GEN_INT ((int) buf);
+  operands[1] = gen_reg_rtx (SFmode);
+  ")
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c
new file mode 100644
index 
0000000000000000000000000000000000000000..8caf090e20a144b65ac50eea65c665c264c908f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" 
} { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -O2 -mthumb 
-mslow-flash-data" } */
+
+float f (float);
+
+const float max = 0.01f;
+
+int
+g (float in)
+{
+  if (f (in) + f (in) < max)
+    return 0;
+  return 1;
+}
+
+double foo (void)
+{
+  return 0xF1EC7A5239123AF;
+}
+
+double bar (void)
+{
+  return 0.0f;
+}
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
new file mode 100644
index 
0000000000000000000000000000000000000000..54392866c55f5df25d54b8ad6ee3a8a46f22855f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" 
} { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -mthumb -mslow-flash-data" } 
*/
+
+/* From PR71607 */
+
+float b;
+void fn1 ();
+
+float
+fn2 ()
+{
+  return 1.1f;
+}
+
+void
+fn3 ()
+{
+  float a[2];
+  a[1] = b;
+  fn1 (a);
+}
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c
new file mode 100644
index 
0000000000000000000000000000000000000000..4a2b8f7949c9ef14a977e929b6555b9e983543c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" 
} { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -O2 -mthumb 
-mslow-flash-data" } */
+
+double __attribute__ ((target ("fpu=fpv5-d16")))
+foo (void)
+{
+  return 1.0f;
+}
+
+float __attribute__ ((target ("fpu=fpv5-d16")))
+bar (void)
+{
+  return 1.0f;
+}
+
+float __attribute__ ((target ("fpu=fpv5-sp-d16")))
+baz (void)
+{
+  return 1.0f;
+}
+
+/* { dg-final { scan-assembler-times "#1\\.0e\\+0" 3 } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c
new file mode 100644
index 
0000000000000000000000000000000000000000..96ac45791c77622fc9071b551287a810b5bea75a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" 
} { "-mcpu=cortex-m4" "-mcpu=cortex-m7" } } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" } } */
+/* { dg-options "-march=armv7e-m -mfloat-abi=hard -O2 -mthumb 
-mslow-flash-data" } */
+
+double __attribute__ ((target ("fpu=fpv5-sp-d16")))
+foo (void)
+{
+  return 1.0f;
+}
+
+/* { dg-final { scan-assembler-not "#1\\.0e\\+0" } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c 
b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c
rename to gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-1.c

Reply via email to