On 27/01/17 12:13, Ramana Radhakrishnan wrote:
> On Thu, Jan 26, 2017 at 3:56 PM, Andre Vieira (lists)
> <andre.simoesdiasvie...@arm.com> wrote:
>> On 20/01/17 14:08, Ramana Radhakrishnan wrote:
>>> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists)
>>> <andre.simoesdiasvie...@arm.com> wrote:
>>>> On 29/11/16 09:45, Andre Vieira (lists) wrote:
>>>>> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>>>>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>>>>> <andre.simoesdiasvie...@arm.com> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> This patch tackles the issue reported in PR71607. This patch takes a
>>>>>>> different approach for disabling the creation of literal pools. Instead
>>>>>>> of disabling the patterns that would normally transform the rtl into
>>>>>>> actual literal pools, it disables the creation of this literal pool rtl
>>>>>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>>>>>>> arm_disable_literal_pool is true. I added patterns to split floating
>>>>>>> point constants for both SF and DFmode. A pattern to handle the
>>>>>>> addressing of label_refs had to be included as well since all
>>>>>>> "memory_operand" patterns are disabled when
>>>>>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>>>>>>> splitting 32-bit immediates had to be changed, it was not accepting
>>>>>>> unsigned 32-bit unsigned integers with the MSB set. I believe
>>>>>>> const_int_operand expects the mode of the operand to be set to VOIDmode
>>>>>>> and not SImode. I have only changed it in the patterns that were
>>>>>>> affecting this code, though I suggest looking into changing it in the
>>>>>>> rest of the ARM backend.
>>>>>>>
>>>>>>> I added more test cases. No regressions for arm-none-eabi with
>>>>>>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>>>>>>
>>>>>>> Is this OK for trunk?
>>>>>>
>>>>>> Including -mslow-flash-data in your multilib flags ? If no regressions
>>>>>> with that ok .
>>>>>>
>>>>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I found some new ICE's with the -mslow-flash-data testing so I had to
>>>>> rework this patch. I took the opportunity to rebase it as well.
>>>>>
>>>>> The problem was with the way the old version of the patch handled label
>>>>> references.  After some digging I found I wasn't using the right target
>>>>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
>>>>> ARM.  This target hook determines whether a literal pool ends up in an
>>>>> 'object_block' structure. So I reverted the changes made in the old
>>>>> version of the patch to the ARM implementation of the
>>>>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
>>>>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
>>>>> implementation for this hook that returns false if
>>>>> 'arm_disable_literal_pool' is set to true and true otherwise.
>>>>>
>>>>> This version of the patch also reverts back to using the check for
>>>>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
>>>>> last version, this code is required to place the label references in
>>>>> rodata sections.
>>>>>
>>>>> Another thing this patch does is revert the changes made to the 32-bit
>>>>> constant split in arm.md. The reason this was needed before was because
>>>>> 'real_to_target' returns a long array and does not sign-extend values in
>>>>> it, which would make sense on hosts with 64-bit longs. To fix this the
>>>>> value is now casted to 'int' first.  It would probably be a good idea to
>>>>> change the 'real_to_target' function to return an array with
>>>>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
>>>>> sign-extend them.  Something for the future?
>>>>>
>>>>> I added more test cases in this patch and reran regression tests for:
>>>>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
>>>>> bootstrap+regressions on arm-none-linux-gnueabihf.
>>>>>
>>>>> Is this OK for trunk?
>>>>>
>>>>> Cheers,
>>>>> Andre
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvie...@arm.com>
>>>>>
>>>>>     PR target/71607
>>>>>     * config/arm/arm.md (use_literal_pool): Removes.
>>>>>     (64-bit immediate split): No longer takes cost into consideration
>>>>>     if 'arm_disable_literal_pool' is enabled.
>>>>>     * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
>>>>>     (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
>>>>>     (arm_max_const_double_inline_cost): Remove use of
>>>>> arm_disable_literal_pool.
>>>>>     * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>>>>>     (no_literal_pool_sf_immediate): New.
>>>>>
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 2016-11-29  Andre Vieira  <andre.simoesdiasvie...@arm.com>
>>>>>             Thomas Preud'homme  <thomas.preudho...@arm.com>
>>>>>
>>>>>     PR target/71607
>>>>>     * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>>>>>     * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
>>>>>
>>>> Hello,
>>>>
>>>> As I have mentioned in my commit message for the fix on embedded-6  (see
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an
>>>> issue with this code when testing its backport on the embedded-6-branch.
>>>>
>>>> I misread the definition of the target hook
>>>> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the way I implemented it
>>>> before only changed code generation if the -mslow-flash-data option
>>>> wasn't passed. I.e. I don't need to implement it to solve this issue.
>>>> The other changes in this patch are sufficient...
>>>>
>>>> I reran regressions for arm-none-eabi-gcc with the following targets:
>>>> Cortex-M0, Cortex-M3, Cortex-M7 and ARMv8-M Baseline. I also ran
>>>> bootstraps and full regressions for arm-none-linux-gnueabihf both in ARM
>>>> and THUMB mode. All green!
>>>
>>> And presumably with -mslow-flash-data as a multilib again ?
>>>
>>>>
>>>> Is this OK for trunk?
>>>
>>> We should also be adding to arm_reorg
>>>
>>>
>>> if (arm_disable_literal_pool)
>>>   return ;
>>>
>>> after we split all the insns.
>>>
>>>
>>> After all if the splitters added had done their job, there's no reason
>>> to run the minipool placement code at all - is there  :) ?
>>>
>>> Furthermore is it guaranteed that we will not have any references to
>>> the literal pool post reload - IIRC , reload in its older days could
>>> easily just push any old constant into a standard constant pool entry
>>> and make things work . Is LRA likely not to produce any constant pool
>>> entries in this form ? The bad thing here would be producing literal
>>> pool entries when the user had asked the compiler to do so - thus
>>> having more belts and braces for this would be good (thus , why is a
>>> change to arm_cannot_force_const_mem to return false in the face of
>>> arm_disable_literal_pool not appropriate ? ) Nick also had a couple of
>>> good points on making the test more usable across all testers in
>>> private conversation - so please make those changes.
>>>
>>>
>>>
>>>
>>> regards
>>> Ramana
>>>
>>>
>> Hello,
>>
>> I forgot to mention I ran the full regression tests for -mcpu=cortex-m3
>> with the -mslow-flash-data option, this showed up clean.
>>
>> After I disabled the creation of literal pools in arm_reorg if
>> 'arm_disable_literal' pool is true the same test showed regressions with
>> TLS tests. The compiler generates assembly that tries to load directly
>> from a gott table symbol which is invalid assembly.  We do not have the
>> required AArch32 TLS relocations for instructions.
> 
> Ok. That sounds reasonable.
> 
> 
>>
>> However, it might be worth mentioning that the GNU ARM Embedded
>> Toolchain is configured to use TLS emulation and this code is thus not
>> generated for toolchains configured with --disable-threads and
>> --disable-tls.  This wrong code generation would thus only occur in the
>> "niche" case of configuring a toolchain with threads and TLS, for a
>> M-profile device whilst compiling code containing TLS variables using
>> -mslow-flash-data.
> 
> Agreed -
> 
> 
> I have seen other requests for a similar feature even on A profile,
> thus documenting this in the form of comments in the source
> would  be very welcome.

I will add a comment to the arm_reorg change with:
  /* FIXME: This will cause the generation of invalid assembly for code
using TLS '__thread' variables.  ARM currently does not provide
relocations to encode these into AArch32 instructions, only data, so
there is no way to currently implement these if a literal pool is
disabled.  */


> 
>>
>> As for your question whether a change to 'arm_cannot_force_const_mem' to
>> return false in the face of 'arm_disable_literal_pool'.  I think you
>> meant to say 'true' here.  I had tried this before, but this leads to
>> arm_legitimate_constant_p always returning false, which prevents the use
>> of constants altogether, even when synthesized into movw and movt's.
>>
> 
> Yes I meant to say true  , jetlag ...
> 
> Did you try refactoring the existing logic into a helper function that
> does what it does today and only return true for
> target_slow_flash_data in the arm_cannot_force_const_mem interface ?
> 
> i.e.
> 
> arm_cannot_force_const_mem => arm_cannot_force_const_mem_1
> 
> arm_legitimate_constant_p calls arm_cannot_force_const_mem_1 ..
> and
> 
> New definition of arm_cannot_force_const_mem returns true for
> constants for target_flash_slow_data.
> 
> and then handles arm_cannot_force_const_mem_1 as it exists today
> 
> Can you experiment with that please ?

Ah I had sort of... and I remembered there was something with label's
and this exposes it.  If I do this, I end up with the following ICE for
thumb2-slow-flash-data-1.c:
error: unrecognizable insn:
 }
 ^
(insn 7 4 8 2 (set (reg/f:SI 114)
        (label_ref:SI 45)) "t.c":55 -1
     (insn_list:REG_LABEL_OPERAND 45 (nil)))

I believe this is because 'force_const_mem' in varasm.c. It is called
from the 'emit_move_insn' expander and it will now keep the 'label_ref'
rather than transform it into something in the following format:
(mem/u/c:SI (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [5  S4 A32])

>From reading the comments in emit_move_insn I suspect this might be a
matter of a missing expander for ARM?  The generation of the memory
wrapper symbol_ref does seem to happen in force_const_mem, so I'm not
entirely sure duplicating this is a good idea...

If I do not make the change to arm_cannot_foce_const_mem, the current
code will generate a .rodata section containing the label address, which
is pointed to by .LC1 (the one inside the symbol_ref).

> 
> I'm happy with that as a follow-up patch.  I'm really keen on making
> sure that the compiler does *not* produce any literal pools at all for
> slow-flash-data and we fail hard in case we can't handle it. The
> requirement is no literal pools and the compiler should enforce that
> to the best of its ability and not paper over problems.
> 
> 
> 
>> The way we currently use the target hooks around literal pools should
>> perhaps be reviewed, but that is a much bigger task. For instance, we
>> should perhaps be using 'TARGET_USE_BLOCKS_FOR_CONSTANT_P', which reads:
>> "This hook should return true if pool entries for constant x can be
>> placed in an object_block structure.". We currently use the default
>> implementation for this hook which returns false.
> 
> I'm not sure if that is appropriate here as this tries to put things
> into constant pool blocks that are shared probably between many
> functions which implies code is able to access the constant pool
> anywhere. There is some tie in to section anchors if I remember
> correctly. I don't think we can use this in the ARM backend because we
> really need the minipool placement to work for constant pools. I think
> that is orthogonal to this.
> 
>>
>> I reran regressions for arm-none-eabi-gcc with the following targets:
>> Cortex-M3, Cortex-M4 and Cortex-M7. I did not reran bootstraps as the
>> change in this new version can only affect M-profile targets and could
>> never change code-generation for other targets. Some of these tests are
>> still running, I will not commit anything before they are done. Though I
>> don't expect anything to change, since the -mslow-flash-data regression
>> testing, see bellow, worked fine.
>>
>> I also reran the -mslow-flash-data for arm-none-eabi with Cortex-M3, and
>> as I mentioned earlier depending on how GCC is configured it could
>> regress for TLS tests.
> 
> 
> Given that maybe we should just disable the use of TLS in
> -mslow-flash-data and give an error ?

How can I check for the use of TLS given -mslow-flash-data? Best I could
think of would be to disable the patterns that handle the unspecs for
PIC and TLS. Though AFAICT its the thumb2_movsi_insn pattern that's
generating the wrong assembly...
> 
> Ok with all the caveats as listed above.
> 
> 
> Ramana
> 
> 
>>
>> Is this OK for trunk?
> 
> 
>>
>> Cheers,
>> Andre

Reply via email to