On 27/01/17 12:13, Ramana Radhakrishnan wrote: > On Thu, Jan 26, 2017 at 3:56 PM, Andre Vieira (lists) > <andre.simoesdiasvie...@arm.com> wrote: >> On 20/01/17 14:08, Ramana Radhakrishnan wrote: >>> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists) >>> <andre.simoesdiasvie...@arm.com> wrote: >>>> On 29/11/16 09:45, Andre Vieira (lists) wrote: >>>>> On 17/11/16 10:00, Ramana Radhakrishnan wrote: >>>>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists) >>>>>> <andre.simoesdiasvie...@arm.com> wrote: >>>>>>> Hello, >>>>>>> >>>>>>> This patch tackles the issue reported in PR71607. This patch takes a >>>>>>> different approach for disabling the creation of literal pools. Instead >>>>>>> of disabling the patterns that would normally transform the rtl into >>>>>>> actual literal pools, it disables the creation of this literal pool rtl >>>>>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if >>>>>>> arm_disable_literal_pool is true. I added patterns to split floating >>>>>>> point constants for both SF and DFmode. A pattern to handle the >>>>>>> addressing of label_refs had to be included as well since all >>>>>>> "memory_operand" patterns are disabled when >>>>>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for >>>>>>> splitting 32-bit immediates had to be changed, it was not accepting >>>>>>> unsigned 32-bit unsigned integers with the MSB set. I believe >>>>>>> const_int_operand expects the mode of the operand to be set to VOIDmode >>>>>>> and not SImode. I have only changed it in the patterns that were >>>>>>> affecting this code, though I suggest looking into changing it in the >>>>>>> rest of the ARM backend. >>>>>>> >>>>>>> I added more test cases. No regressions for arm-none-eabi with >>>>>>> Cortex-M0, Cortex-M3 and Cortex-M7. >>>>>>> >>>>>>> Is this OK for trunk? >>>>>> >>>>>> Including -mslow-flash-data in your multilib flags ? If no regressions >>>>>> with that ok . >>>>>> >>>>>> >>>>>> regards >>>>>> Ramana >>>>>> >>>>>>> >>>>> >>>>> Hello, >>>>> >>>>> I found some new ICE's with the -mslow-flash-data testing so I had to >>>>> rework this patch. I took the opportunity to rebase it as well. >>>>> >>>>> The problem was with the way the old version of the patch handled label >>>>> references. After some digging I found I wasn't using the right target >>>>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for >>>>> ARM. This target hook determines whether a literal pool ends up in an >>>>> 'object_block' structure. So I reverted the changes made in the old >>>>> version of the patch to the ARM implementation of the >>>>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on >>>>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM >>>>> implementation for this hook that returns false if >>>>> 'arm_disable_literal_pool' is set to true and true otherwise. >>>>> >>>>> This version of the patch also reverts back to using the check for >>>>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the >>>>> last version, this code is required to place the label references in >>>>> rodata sections. >>>>> >>>>> Another thing this patch does is revert the changes made to the 32-bit >>>>> constant split in arm.md. The reason this was needed before was because >>>>> 'real_to_target' returns a long array and does not sign-extend values in >>>>> it, which would make sense on hosts with 64-bit longs. To fix this the >>>>> value is now casted to 'int' first. It would probably be a good idea to >>>>> change the 'real_to_target' function to return an array with >>>>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or >>>>> sign-extend them. Something for the future? >>>>> >>>>> I added more test cases in this patch and reran regression tests for: >>>>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a >>>>> bootstrap+regressions on arm-none-linux-gnueabihf. >>>>> >>>>> Is this OK for trunk? >>>>> >>>>> Cheers, >>>>> Andre >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2016-11-29 Andre Vieira <andre.simoesdiasvie...@arm.com> >>>>> >>>>> PR target/71607 >>>>> * config/arm/arm.md (use_literal_pool): Removes. >>>>> (64-bit immediate split): No longer takes cost into consideration >>>>> if 'arm_disable_literal_pool' is enabled. >>>>> * config/arm/arm.c (arm_use_blocks_for_constant_p): New. >>>>> (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define. >>>>> (arm_max_const_double_inline_cost): Remove use of >>>>> arm_disable_literal_pool. >>>>> * config/arm/vfp.md (no_literal_pool_df_immediate): New. >>>>> (no_literal_pool_sf_immediate): New. >>>>> >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> 2016-11-29 Andre Vieira <andre.simoesdiasvie...@arm.com> >>>>> Thomas Preud'homme <thomas.preudho...@arm.com> >>>>> >>>>> PR target/71607 >>>>> * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ... >>>>> * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this. >>>>> * gcc.target/arm/thumb2-slow-flash-data-2.c: New. >>>>> * gcc.target/arm/thumb2-slow-flash-data-3.c: New. >>>>> * gcc.target/arm/thumb2-slow-flash-data-4.c: New. >>>>> * gcc.target/arm/thumb2-slow-flash-data-5.c: New. >>>>> >>>> Hello, >>>> >>>> As I have mentioned in my commit message for the fix on embedded-6 (see >>>> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an >>>> issue with this code when testing its backport on the embedded-6-branch. >>>> >>>> I misread the definition of the target hook >>>> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the way I implemented it >>>> before only changed code generation if the -mslow-flash-data option >>>> wasn't passed. I.e. I don't need to implement it to solve this issue. >>>> The other changes in this patch are sufficient... >>>> >>>> I reran regressions for arm-none-eabi-gcc with the following targets: >>>> Cortex-M0, Cortex-M3, Cortex-M7 and ARMv8-M Baseline. I also ran >>>> bootstraps and full regressions for arm-none-linux-gnueabihf both in ARM >>>> and THUMB mode. All green! >>> >>> And presumably with -mslow-flash-data as a multilib again ? >>> >>>> >>>> Is this OK for trunk? >>> >>> We should also be adding to arm_reorg >>> >>> >>> if (arm_disable_literal_pool) >>> return ; >>> >>> after we split all the insns. >>> >>> >>> After all if the splitters added had done their job, there's no reason >>> to run the minipool placement code at all - is there :) ? >>> >>> Furthermore is it guaranteed that we will not have any references to >>> the literal pool post reload - IIRC , reload in its older days could >>> easily just push any old constant into a standard constant pool entry >>> and make things work . Is LRA likely not to produce any constant pool >>> entries in this form ? The bad thing here would be producing literal >>> pool entries when the user had asked the compiler to do so - thus >>> having more belts and braces for this would be good (thus , why is a >>> change to arm_cannot_force_const_mem to return false in the face of >>> arm_disable_literal_pool not appropriate ? ) Nick also had a couple of >>> good points on making the test more usable across all testers in >>> private conversation - so please make those changes. >>> >>> >>> >>> >>> regards >>> Ramana >>> >>> >> Hello, >> >> I forgot to mention I ran the full regression tests for -mcpu=cortex-m3 >> with the -mslow-flash-data option, this showed up clean. >> >> After I disabled the creation of literal pools in arm_reorg if >> 'arm_disable_literal' pool is true the same test showed regressions with >> TLS tests. The compiler generates assembly that tries to load directly >> from a gott table symbol which is invalid assembly. We do not have the >> required AArch32 TLS relocations for instructions. > > Ok. That sounds reasonable. > > >> >> However, it might be worth mentioning that the GNU ARM Embedded >> Toolchain is configured to use TLS emulation and this code is thus not >> generated for toolchains configured with --disable-threads and >> --disable-tls. This wrong code generation would thus only occur in the >> "niche" case of configuring a toolchain with threads and TLS, for a >> M-profile device whilst compiling code containing TLS variables using >> -mslow-flash-data. > > Agreed - > > > I have seen other requests for a similar feature even on A profile, > thus documenting this in the form of comments in the source > would be very welcome.
I will add a comment to the arm_reorg change with: /* FIXME: This will cause the generation of invalid assembly for code using TLS '__thread' variables. ARM currently does not provide relocations to encode these into AArch32 instructions, only data, so there is no way to currently implement these if a literal pool is disabled. */ > >> >> As for your question whether a change to 'arm_cannot_force_const_mem' to >> return false in the face of 'arm_disable_literal_pool'. I think you >> meant to say 'true' here. I had tried this before, but this leads to >> arm_legitimate_constant_p always returning false, which prevents the use >> of constants altogether, even when synthesized into movw and movt's. >> > > Yes I meant to say true , jetlag ... > > Did you try refactoring the existing logic into a helper function that > does what it does today and only return true for > target_slow_flash_data in the arm_cannot_force_const_mem interface ? > > i.e. > > arm_cannot_force_const_mem => arm_cannot_force_const_mem_1 > > arm_legitimate_constant_p calls arm_cannot_force_const_mem_1 .. > and > > New definition of arm_cannot_force_const_mem returns true for > constants for target_flash_slow_data. > > and then handles arm_cannot_force_const_mem_1 as it exists today > > Can you experiment with that please ? Ah I had sort of... and I remembered there was something with label's and this exposes it. If I do this, I end up with the following ICE for thumb2-slow-flash-data-1.c: error: unrecognizable insn: } ^ (insn 7 4 8 2 (set (reg/f:SI 114) (label_ref:SI 45)) "t.c":55 -1 (insn_list:REG_LABEL_OPERAND 45 (nil))) I believe this is because 'force_const_mem' in varasm.c. It is called from the 'emit_move_insn' expander and it will now keep the 'label_ref' rather than transform it into something in the following format: (mem/u/c:SI (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [5 S4 A32]) >From reading the comments in emit_move_insn I suspect this might be a matter of a missing expander for ARM? The generation of the memory wrapper symbol_ref does seem to happen in force_const_mem, so I'm not entirely sure duplicating this is a good idea... If I do not make the change to arm_cannot_foce_const_mem, the current code will generate a .rodata section containing the label address, which is pointed to by .LC1 (the one inside the symbol_ref). > > I'm happy with that as a follow-up patch. I'm really keen on making > sure that the compiler does *not* produce any literal pools at all for > slow-flash-data and we fail hard in case we can't handle it. The > requirement is no literal pools and the compiler should enforce that > to the best of its ability and not paper over problems. > > > >> The way we currently use the target hooks around literal pools should >> perhaps be reviewed, but that is a much bigger task. For instance, we >> should perhaps be using 'TARGET_USE_BLOCKS_FOR_CONSTANT_P', which reads: >> "This hook should return true if pool entries for constant x can be >> placed in an object_block structure.". We currently use the default >> implementation for this hook which returns false. > > I'm not sure if that is appropriate here as this tries to put things > into constant pool blocks that are shared probably between many > functions which implies code is able to access the constant pool > anywhere. There is some tie in to section anchors if I remember > correctly. I don't think we can use this in the ARM backend because we > really need the minipool placement to work for constant pools. I think > that is orthogonal to this. > >> >> I reran regressions for arm-none-eabi-gcc with the following targets: >> Cortex-M3, Cortex-M4 and Cortex-M7. I did not reran bootstraps as the >> change in this new version can only affect M-profile targets and could >> never change code-generation for other targets. Some of these tests are >> still running, I will not commit anything before they are done. Though I >> don't expect anything to change, since the -mslow-flash-data regression >> testing, see bellow, worked fine. >> >> I also reran the -mslow-flash-data for arm-none-eabi with Cortex-M3, and >> as I mentioned earlier depending on how GCC is configured it could >> regress for TLS tests. > > > Given that maybe we should just disable the use of TLS in > -mslow-flash-data and give an error ? How can I check for the use of TLS given -mslow-flash-data? Best I could think of would be to disable the patterns that handle the unspecs for PIC and TLS. Though AFAICT its the thumb2_movsi_insn pattern that's generating the wrong assembly... > > Ok with all the caveats as listed above. > > > Ramana > > >> >> Is this OK for trunk? > > >> >> Cheers, >> Andre