Re: [committed v4 5/5] aarch64: Add function multiversioning support

2023-12-15 Thread Ramana Radhakrishnan
On Sat, Dec 16, 2023 at 6:18 AM Andrew Carlotti  wrote:
>
> This adds initial support for function multiversioning on aarch64 using
> the target_version and target_clones attributes.  This loosely follows
> the Beta specification in the ACLE [1], although with some differences
> that still need to be resolved (possibly as follow-up patches).
>
> Existing function multiversioning implementations are broken in various
> ways when used across translation units.  This includes placing
> resolvers in the wrong translation units, and using symbol mangling that
> callers to unintentionally bypass the resolver in some circumstances.
> Fixing these issues for aarch64 will require modifications to our ACLE
> specification.  It will also require further adjustments to existing
> middle end code, to facilitate different mangling and resolver
> placement while preserving existing target behaviours.
>
> The list of function multiversioning features specified in the ACLE is
> also inconsistent with the list of features supported in target option
> extensions.  I intend to resolve some or all of these inconsistencies at
> a later stage.
>
> The target_version attribute is currently only supported in C++, since
> this is the only frontend with existing support for multiversioning
> using the target attribute.  On the other hand, this patch happens to
> enable multiversioning with the target_clones attribute in Ada and D, as
> well as the entire C family, using their existing frontend support.
>
> This patch also does not support the following aspects of the Beta
> specification:
>
> - The target_clones attribute should allow an implicit unlisted
>   "default" version.
> - There should be an option to disable function multiversioning at
>   compile time.
> - Unrecognised target names in a target_clones attribute should be
>   ignored (with an optional warning).  This current patch raises an
>   error instead.
>
> [1] 
> https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
>
> Committed as approved with the coding convention fix, plus some adjustments to
> aarch64-option-extensions.def to accommodate recent changes on master. The
> series passed regression testing as a whole post-rebase on aarch64.

Pretty neat, very nice to see this work land - I would consider this
for the NEWS page for GCC-14.

Ramana

>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-feature-deps.h (fmv_deps_):
> Define aarch64_feature_flags mask foreach FMV feature.
> * config/aarch64/aarch64-option-extensions.def: Use new macros
> to define FMV feature extensions.
> * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
> Check for target_version attribute after processing target
> attribute.
> (aarch64_fmv_feature_data): New.
> (aarch64_parse_fmv_features): New.
> (aarch64_process_target_version_attr): New.
> (aarch64_option_valid_version_attribute_p): New.
> (get_feature_mask_for_version): New.
> (compare_feature_masks): New.
> (aarch64_compare_version_priority): New.
> (build_ifunc_arg_type): New.
> (make_resolver_func): New.
> (add_condition_to_bb): New.
> (dispatch_function_versions): New.
> (aarch64_generate_version_dispatcher_body): New.
> (aarch64_get_function_versions_dispatcher): New.
> (aarch64_common_function_versions): New.
> (aarch64_mangle_decl_assembler_name): New.
> (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
> (TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
> (TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
> (TARGET_COMPARE_VERSION_PRIORITY): New implementation.
> (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
> (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
> (TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
> * config/aarch64/aarch64.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE):
> Set target macro.
> * config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
> new value to report duplicate FMV feature.
> * common/config/aarch64/cpuinfo.h: New file.
>
> libgcc/ChangeLog:
>
> * config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
> copy in gcc/common
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/options_set_17.c: Reorder expected flags.
> * gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
> * gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
> * 

Re: [PATCH 1/6] aarch64: Sync system register information with Binutils

2023-10-08 Thread Ramana Radhakrishnan


> On 5 Oct 2023, at 14:04, Victor Do Nascimento  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/5/23 12:42, Richard Earnshaw wrote:
>> 
>> 
>> On 03/10/2023 16:18, Victor Do Nascimento wrote:
>>> This patch adds the `aarch64-sys-regs.def' file to GCC, teaching
>>> the compiler about system registers known to the assembler and how
>>> these can be used.
>>> 
>>> The macros used to hold system register information reflect those in
>>> use by binutils, a design choice made to facilitate the sharing of data
>>> between different parts of the toolchain.
>>> 
>>> By aligning the representation of data common to different parts of
>>> the toolchain we can greatly reduce the duplication of work,
>>> facilitating the maintenance of the aarch64 back-end across different
>>> parts of the toolchain; any `SYSREG (...)' that is added in one
>>> project can just as easily be added to its counterpart.
>>> 
>>> GCC does not implement the full range of ISA flags present in
>>> Binutils.  Where this is the case, aliases must be added to aarch64.h
>>> with the unknown architectural extension being mapped to its
>>> associated base architecture, such that any flag present in Binutils
>>> and used in system register definitions is understood in GCC.  Again,
>>> this is done such that flags can be used interchangeably between
>>> projects making use of the aarch64-system-regs.def file.  This is done
>>> in the next patch in the series.
>>> 
>>> `.arch' directives missing from the emitted assembly files as a
>>> consequence of this aliasing are accounted for by the compiler using
>>> the S encoding of system registers when
>>> issuing mrs/msr instructions.  This design choice ensures the
>>> assembler will accept anything that was deemed acceptable by the
>>> compiler.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>* gcc/config/aarch64/aarch64-system-regs.def: New.
>>> ---
>>>  gcc/config/aarch64/aarch64-sys-regs.def | 1059 +++
>>>  1 file changed, 1059 insertions(+)
>>>  create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def
>> 
>> This file is supposed to be /identical/ to the one in GNU Binutils,
>> right?
> 
> You're right Richard.
> 
> We want the same file to be compatible with both parts of the toolchain
> and, consequently, there is no compelling reason as to why the copy of
> the file found in GCC should in any way diverge from its Binutils
> counterpart.
> 
>> If so, I think it needs to continue to say that it is part of
>> GNU Binutils, not part of GCC.  Ramana, has this happened before?  If
>> not, does the SC have a position here?
>> 

I’ve not had the time to delve into the patch, apologies.


Is the intention here to keep a copy of the file with the main copy being in 
binutils i.e. modifications are made in binutils and then sync’d with GCC at 
the same time ?


In which case the comments in the file should make the mechanics of updates 
abundantly clear.

Is there any reason why if the 2 versions were different, you’d have problems 
between gcc and binutils ? 

If so, what kinds of problems would they be ? i.e. would they be no more than 
gas not knowing about a system register that GCC claimed to know because 
binutils and gcc were built with different versions of the system register 
file. 

Speaking for myself, I do not see this request being any different from the 
requests for imports from other repositories into the GCC repository.



>> R.
> 
> This does raise a very interesting question on the intellectual property
> front and one that is well beyond my competence to opine about.
> 
> Nonetheless, this is a question which may arise again if we abstract
> away more target description data into such .def files, as has been
> discussed for architectural feature flags (for example).
> 
> So what might be nice (but not necessarily tenable) is if we had
> appropriate provisions in place for where files were shared across
> different parts of the toolchain.
> 
> Something like "This file is a shared resource of GCC and Binutils."



This model of an additional shared repository with a build dependency will 
transfer the “copy in every dependent repository” to a dependency on a 
“suitable hash in every dependent repository” problem which is certainly 
something to consider

And then the question comes for the GCC project about how many such 
dependencies with different repositories it tracks :) ? Perhaps git submodules 
can be considered , however I am not sure how much that has been looked at 
after the git conversion. 



regards
Ramana




> 
> Anyway, that's my two cents on the matter :).
> 
> Let's see what Ramana has to say on the matter.
> 
> V.
> 
> 
>>> diff --git a/gcc/config/aarch64/aarch64-sys-regs.def
>>> b/gcc/config/aarch64/aarch64-sys-regs.def
>>> new file mode 100644
>>> index 000..d77fee1d5e3
>>> --- /dev/null
>>> +++ b/gcc/config/aarch64/aarch64-sys-regs.def
>>> @@ -0,0 +1,1059 @@
>>> +/* Copyright (C) 2023 Free Software 

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-09-30 Thread Ramana Radhakrishnan
+ linaro-toolchain as I don't understand the CI issues on patchwork.

On Wed, Sep 27, 2023 at 8:40 PM Wilco Dijkstra  wrote:
>
> Hi Ramana,
>
> > Hope this helps.
>
> Yes definitely!
>
> >> Passes regress/bootstrap, OK for commit?
> >
> > Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
> > Please be specific.
>
> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
> default armhf settings are incorrect. I shouldn't need the --with-float=hard 
> since
> that is obviously implied by armhf, and they should also imply armv7-a with 
> vfpv3
> according to documentation. It seems to get confused and skip some tests. I 
> tried
> using --with-fpu=auto, but that doesn't work at all, so in the end I forced 
> it like:
> --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
> tests.

Yeah that's a wart that I don't like.

armhf just implies the hard float ABI and came into being to help
distinguish from the Base PCS for some of the distros at the time
(2010s). However we didn't want to set a baseline arch at that time
given the imminent arrival of v8-a and thus the specification of
--with-arch , --with-fpu and --with-float became second nature to many
of us working on it at that time.

I can see how it can be confusing.

With the advent of fpu=auto , I thought this could be made better but
that is a matter for another patch and another discussion.

However until then do remember to baseline to --with-arch , --with-fpu
and --with-float . Unfortunate but needed. If there is a bug with
--with-fpu=auto , good to articulate it as I understand that to be the
state of the art.

Thank you for working through it.

>
> > Since these patterns touch armv8m.baseline can you find all the
> > testcases in the testsuite and ensure no change in code for
> > armv8m.baseline as that's unpredicated already and this patch brings
> > this in line with the same ? Does the testsuite already cover these
> > arch variants and are you satisfied that the tests in the testsuite
> > can catch / don't make any additional code changes to the other
> > architectures affected by this ?
>
> There are various v8-m(.base/.main) tests and they all pass. The generated
> code is generally unchanged if there was no conditional execution. I made
> the new UNSPEC_LDR/STR patterns support offsets so there is no difference
> in generated code for relaxed loads/stores (since they used to use a plain
> load/store which has an immediate offset).
>
> >> * onfig/arm/sync.md (arm_atomic_load): Add new pattern.
> >
> > Nit: s/onfig/config
>
> Fixed.

Thanks

>
> >> (atomic_load): Always expand atomic loads explicitly.
> >> (atomic_store): Always expand atomic stores explicitly.
> >
> > Nit: Change message to :
> > Switch patterns to define_expand.
>
> Fixed.

Thanks.

>
> > Largely looks ok though I cannot work out tonight if we need more v8-a
> > or v8m-baseline specific tests for scan-assembler patterns.
> >
> > Clearly our testsuite doesn't catch it , so perhaps the OP could help
> > validate this patch with their formal models to see if this fixes
> > these set of issues and creates no new regressions ? Is that feasible
> > to do ?
>
> Disabling conditional execution avoids the issue. It's trivial to verify that
> atomics can no longer be conditionally executed (no "%?"). When this is
> committed, we can run the random testing again to confirm the issue
> is no longer present.

Ok, thanks for promising to do so - I trust you to get it done. Please
try out various combinations of -march v7ve, v7-a , v8-a with the tool
as each of them have slightly different rules. For instance v7ve
allows LDREXD and STREXD to be single copy atomic for 64 bit loads
whereas v7-a did not .

>
> > -(define_insn "atomic_load"
> > -  [(set (match_operand:QHSI 0 "register_operand" "=r,r,l")
> > +(define_insn "arm_atomic_load"
> > +  [(set (match_operand:QHSI 0 "register_operand" "=r,l")
> >  (unspec_volatile:QHSI
> > -  [(match_operand:QHSI 1 "arm_sync_memory_operand" "Q,Q,Q")
> > -   (match_operand:SI 2 "const_int_operand" "n,Pf,n")]  ;; model
> > +  [(match_operand:QHSI 1 "memory_operand" "m,m")]
> >
> > Remind me again why is it safe to go from the Q constraint to the m
> > constraint here and everywhere else you've done this ?
>
> That's because the relaxed loads/stores use LDR/STR wrapped in an
> UNSPEC. To avoid regressions we have to use 'm' so that an immediate
> offset can be merged into the memory access.

This is because the instructions used here are ldr and str and the use
of the 'm' constraint is considered safe.


>
> >> -  VUNSPEC_LDA  ; Represent a store-register-acquire.
> >> +  VUNSPEC_LDR  ; Represent a load-register-relaxed.
> >> +  VUNSPEC_LDA  ; Represent a load-register-acquire.
> >
> > Nit: LDA before LDR ? Though I suspect this list can be alphabetically

Re: [PATCH]AArch64 Add movi for 0 moves for scalar types [PR109154]

2023-09-26 Thread Ramana Radhakrishnan
On Wed, Sep 27, 2023 at 1:51 AM Tamar Christina  wrote:
>
> Hi All,
>
> Following the Neoverse N/V and Cortex-A optimization guides SIMD 0 immediates
> should be created with a movi of 0.
>
> At the moment we generate an `fmov .., xzr` which is slower and requires a
> GP -> FP transfer.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/109154
> * config/aarch64/aarch64.md (*mov_aarch64, *movsi_aarch64,
> *movdi_aarch64): Add new w -> Z case.
> * config/aarch64/iterators.md (Vbtype): Add QI and HI.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/109154
> * gcc.target/aarch64/fneg-abs_2.c: Updated.
> * gcc.target/aarch64/fneg-abs_4.c: Updated.
>
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> b51f979dba12b726bff0c1109b75c6d2c7ae41ab..60c92213c75a2a4c18a6b59ae52fe45d1e872718
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1232,6 +1232,7 @@ (define_insn "*mov_aarch64"
>"(register_operand (operands[0], mode)
>  || aarch64_reg_or_zero (operands[1], mode))"
>{@ [cons: =0, 1; attrs: type, arch]
> + [w, Z; neon_move  , simd  ] movi\t%0., #0
>   [r, r; mov_reg, * ] mov\t%w0, %w1
>   [r, M; mov_imm, * ] mov\t%w0, %1
>   [w, D; neon_move  , simd  ] << 
> aarch64_output_scalar_simd_mov_immediate (operands[1], mode);
> @@ -1289,6 +1290,7 @@ (define_insn_and_split "*movsi_aarch64"
>"(register_operand (operands[0], SImode)
>  || aarch64_reg_or_zero (operands[1], SImode))"
>{@ [cons: =0, 1; attrs: type, arch, length]
> + [w  , Z  ; neon_move, simd, 4] movi\t%0.2d, #0
>   [r k, r  ; mov_reg  , *   , 4] mov\t%w0, %w1
>   [r  , k  ; mov_reg  , *   , 4] ^
>   [r  , M  ; mov_imm  , *   , 4] mov\t%w0, %1
> @@ -1322,6 +1324,7 @@ (define_insn_and_split "*movdi_aarch64"
>"(register_operand (operands[0], DImode)
>  || aarch64_reg_or_zero (operands[1], DImode))"
>{@ [cons: =0, 1; attrs: type, arch, length]
> + [w, Z  ; neon_move, simd, 4] movi\t%0.2d, #0
>   [r, r  ; mov_reg  , *   , 4] mov\t%x0, %x1
>   [k, r  ; mov_reg  , *   , 4] mov\t%0, %x1
>   [r, k  ; mov_reg  , *   , 4] mov\t%x0, %1
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> 2451d8c2cd8e2da6ac8339eed9bc975cf203fa4c..d17becc37e230684beaee3c69e2a0f0ce612eda5
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -1297,6 +1297,7 @@ (define_mode_attr Vbtype [(V8QI "8b")  (V16QI "16b")
>   (V4SF "16b") (V2DF  "16b")
>   (DI   "8b")  (DF"8b")
>   (SI   "8b")  (SF"8b")
> + (QI   "8b")  (HI"8b")
>   (V4BF "8b")  (V8BF  "16b")])
>
>  ;; Advanced SIMD vector structure to element modes.
> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c 
> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
> index 
> fb14ec3e2210e0feeff80f2410d777d3046a9f78..5e253d3059cfc9b93bd0865e6eaed1231eba19bd
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
> @@ -20,7 +20,7 @@ float32_t f1 (float32_t a)
>
>  /*
>  ** f2:
> -** fmovd[0-9]+, xzr
> +** moviv[0-9]+.2d, #0
>  ** fnegv[0-9]+.2d, v[0-9]+.2d
>  ** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>  ** ret
> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c 
> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> index 
> 4ea0105f6c0a9756070bcc60d34f142f53d8242c..c86fe3e032c9e5176467841ce1a679ea47bbd531
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> @@ -8,7 +8,7 @@
>
>  /*
>  ** negabs:
> -** fmovd[0-9]+, xzr
> +** moviv31.2d, #0
>  ** fnegv[0-9]+.2d, v[0-9]+.2d
>  ** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>  ** ret
>
>
>
>


LGTM.  I just clocked that the simd attribute is disabled with
-mgeneral-regs-only which allows for this to work .. Neat.


 I cannot approve.

Ramana
> --


Re: [PATCH]AArch64 Rewrite simd move immediate patterns to new syntax

2023-09-26 Thread Ramana Radhakrishnan
, *   , 4] stp\txzr, xzr, %0
> + [m  , w ; neon_store1_1reg, *   , 4] str\t%q1, %0
> + [w  , w ; neon_logic  , simd, 4] mov\t%0., %1.
> + [?r , w ; multiple   , *   , 8] #
> + [?w , r ; multiple   , *   , 8] #
> + [?r , r ; multiple   , *   , 8] #
> + [w  , Dn; neon_move   , simd, 4] << 
> aarch64_output_simd_mov_immediate (operands[1], 128);
> + [w  , Dz; fmov   , *   , 4] fmov\t%d0, xzr
> +  }
> +  "&& reload_completed
> +   && !(FP_REGNUM_P (REGNO (operands[0]))
> +   && FP_REGNUM_P (REGNO (operands[1])))"
> +  [(const_int 0)]
> +  {
> +if (GP_REGNUM_P (REGNO (operands[0]))
> +   && GP_REGNUM_P (REGNO (operands[1])))
> +  aarch64_simd_emit_reg_reg_move (operands, DImode, 2);
> +else
> +  aarch64_split_simd_move (operands[0], operands[1]);
> +DONE;
> +  }
>  )
>


Reads correctly at first glance. Perhaps a sanity check with the
aarch64 simd intrinsics suite, vect.exp or tsvc under a suitable
multilib to give some confidence as to no code changes. ?

Reviewed-by:   Ramana Radhakrishnan  

regards
Ramana

Ramana




>  ;; When storing lane zero we can use the normal STR and its more permissive
> @@ -276,33 +279,6 @@ (define_insn "vec_store_pair"
>[(set_attr "type" "neon_stp_q")]
>  )
>
> -
> -(define_split
> -  [(set (match_operand:VQMOV 0 "register_operand" "")
> -   (match_operand:VQMOV 1 "register_operand" ""))]
> -  "TARGET_FLOAT
> -   && reload_completed
> -   && GP_REGNUM_P (REGNO (operands[0]))
> -   && GP_REGNUM_P (REGNO (operands[1]))"
> -  [(const_int 0)]
> -{
> -  aarch64_simd_emit_reg_reg_move (operands, DImode, 2);
> -  DONE;
> -})
> -
> -(define_split
> -  [(set (match_operand:VQMOV 0 "register_operand" "")
> -(match_operand:VQMOV 1 "register_operand" ""))]
> -  "TARGET_FLOAT
> -   && reload_completed
> -   && ((FP_REGNUM_P (REGNO (operands[0])) && GP_REGNUM_P (REGNO 
> (operands[1])))
> -   || (GP_REGNUM_P (REGNO (operands[0])) && FP_REGNUM_P (REGNO 
> (operands[1]"
> -  [(const_int 0)]
> -{
> -  aarch64_split_simd_move (operands[0], operands[1]);
> -  DONE;
> -})
> -
>  (define_expand "@aarch64_split_simd_mov"
>[(set (match_operand:VQMOV 0)
> (match_operand:VQMOV 1))]
>
>
>
>
> --


Re: [PATCH] ARM: Block predication on atomics [PR111235]

2023-09-26 Thread Ramana Radhakrishnan
Reviewed-by: Ramana Radhakrishnan 

A very initial review here . I think it largely looks ok based on the
description but I've spotted a few obvious nits and things that come
to mind on reviewing this. I've not done a very deep review but hope
it helps you move forward. I'm happy to work with you on landing this
if that helps. I'll try and find some time tomorrow to look at this
again.

Hope this helps.


On Thu, Sep 7, 2023 at 3:07 PM Wilco Dijkstra via Gcc-patches
 wrote:
>
> The v7 memory ordering model allows reordering of conditional atomic 
> instructions.
> To avoid this, make all atomic patterns unconditional.  Expand atomic loads 
> and
> stores for all architectures so the memory access can be wrapped into an 
> UNSPEC.

>
> Passes regress/bootstrap, OK for commit?

Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ?
Please be specific.


Since these patterns touch armv8m.baseline can you find all the
testcases in the testsuite and ensure no change in code for
armv8m.baseline as that's unpredicated already and this patch brings
this in line with the same ? Does the testsuite already cover these
arch variants and are you satisfied that the tests in the testsuite
can catch / don't make any additional code changes to the other
architectures affected by this ?


>
> gcc/ChangeLog/
> PR target/111235
> * config/arm/constraints.md: Remove Pf constraint.
> * onfig/arm/sync.md (arm_atomic_load): Add new pattern.

Nit: s/onfig/config

> (arm_atomic_load_acquire): Likewise.
> (arm_atomic_store): Likewise.
> (arm_atomic_store_release): Likewise.

Ok.

> (atomic_load): Always expand atomic loads explicitly.
> (atomic_store): Always expand atomic stores explicitly.

Nit: Change message to :

Switch patterns to define_expand.

> (arm_atomic_loaddi2_ldrd): Remove predication.
> (arm_load_exclusive): Likewise.
> (arm_load_acquire_exclusive): Likewise.
> (arm_load_exclusivesi): Likewise.
> (arm_load_acquire_exclusivesi: Likewise.
> (arm_load_exclusivedi): Likewise.
> (arm_load_acquire_exclusivedi): Likewise.
> (arm_store_exclusive): Likewise.
> (arm_store_release_exclusivedi): Likewise.
> (arm_store_release_exclusive): Likewise.
> * gcc/config/arm/unspecs.md: Add VUNSPEC_LDR and VUNSPEC_STR.
>
> gcc/testsuite/ChangeLog/
> PR target/111235
> * gcc.target/arm/pr111235.c: Add new test.
>

Largely looks ok though I cannot work out tonight if we need more v8-a
or v8m-baseline specific tests for scan-assembler patterns.

Clearly our testsuite doesn't catch it , so perhaps the OP could help
validate this patch with their formal models to see if this fixes
these set of issues and creates no new regressions ? Is that feasible
to do ?

> ---
>
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index 
> 05a4ebbdd67601d7b92aa44a619d17634cc69f17..d7c4a1b0cd785f276862048005e6cfa57cdcb20d
>  100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -36,7 +36,7 @@
>  ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
>  ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, 
> Ra,
>  ;;  Rg, Ri
> -;; in all states: Pf, Pg
> +;; in all states: Pg
>
>  ;; The following memory constraints have been used:
>  ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
> @@ -239,13 +239,6 @@ (define_constraint "Pe"
>(and (match_code "const_int")
> (match_test "TARGET_THUMB1 && ival >= 256 && ival <= 510")))
>
> -(define_constraint "Pf"
> -  "Memory models except relaxed, consume or release ones."
> -  (and (match_code "const_int")
> -   (match_test "!is_mm_relaxed (memmodel_from_int (ival))
> -   && !is_mm_consume (memmodel_from_int (ival))
> -   && !is_mm_release (memmodel_from_int (ival))")))
> -
>  (define_constraint "Pg"
>"@internal In Thumb-2 state a constant in range 1 to 32"
>(and (match_code "const_int")
> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
> index 
> 7626bf3c443285dc63b4c4367b11a879a99c93c6..2210810f67f37ce043b8fdc73b4f21b54c5b1912
>  100644
> --- a/gcc/config/arm/sync.md
> +++ b/gcc/config/arm/sync.md
> @@ -62,68 +62,110 @@ (define_insn "*memory_barrier"
> (set_attr "conds" "unconditional")
> (set_attr "predicable" "no")])
>
> -(define_insn "atomic_load"
> -  [(set (match_operand:QHSI 0 "register_operand" "

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-09-26 Thread Ramana Radhakrishnan
Hi Wilco,

Thanks for your email.

On Tue, Sep 26, 2023 at 12:07 AM Wilco Dijkstra  wrote:
>
> Hi Ramana,
>
> >> __sync_val_compare_and_swap may be used on 128-bit types and either calls 
> >> the
> >> outline atomic code or uses an inline loop.  On AArch64 LDXP is only 
> >> atomic if
> >> the value is stored successfully using STXP, but the current 
> >> implementations
> >> do not perform the store if the comparison fails.  In this case the value 
> >> returned
> >> is not read atomically.
> >
> > IIRC, the previous discussions in this space revolved around the
> > difficulty with the store writing to readonly memory which is why I
> > think we went with LDXP in this form.
>
> That's not related to this patch - this fixes a serious atomicity bug that may
> affect the Linux kernel since it uses the older sync primitives. Given that 
> LDXP
> is not atomic on its own, you have to execute the STXP even in the failure 
> case.
> Note that you can't rely on compare not to write memory: load-exclusive
> loops may either always write or avoid writes in the failure case if the load 
> is
> atomic. CAS instructions always write.
>

I am aware of the capabilities of the architecture.

> > Has something changed from then ?
>
> Yes, we now know that using locking atomics was a bad decision. Developers
> actually require efficient and lock-free atomics. Since we didn't support 
> them,
> many applications were forced to add their own atomic implementations using
> hacky inline assembler. It also resulted in a nasty ABI incompatibility 
> between
> GCC and LLVM. Yes - atomics are part of the ABI!

I agree that atomics are part of the ABI.

>
> All that is much worse than worrying about a theoretical corner case that
> can't happen in real applications - atomics only work on writeable memory
> since their purpose is to synchronize reads with writes.


I remember this to be the previous discussions and common understanding.

https://gcc.gnu.org/legacy-ml/gcc/2016-06/msg00017.html

and here

https://gcc.gnu.org/legacy-ml/gcc-patches/2017-02/msg00168.html

Can you point any discussion recently that shows this has changed and
point me at that discussion if any anywhere ? I can't find it in my
searches . Perhaps you've had the discussion some place to show it has
changed.


regards
Ramana



>
> Cheers,
> Wilco


Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-09-25 Thread Ramana Radhakrishnan
On Wed, Sep 13, 2023 at 3:55 PM Wilco Dijkstra via Gcc-patches
 wrote:
>
>
> __sync_val_compare_and_swap may be used on 128-bit types and either calls the
> outline atomic code or uses an inline loop.  On AArch64 LDXP is only atomic if
> the value is stored successfully using STXP, but the current implementations
> do not perform the store if the comparison fails.  In this case the value 
> returned
> is not read atomically.

IIRC, the previous discussions in this space revolved around the
difficulty with the store writing to readonly memory which is why I
think we went with LDXP in this form.
Has something changed from then ?

Reviewed-by : Ramana Radhakrishnan  

regards
Ramana




>
> Passes regress/bootstrap, OK for commit?
>
> gcc/ChangeLog/
> PR target/111404
> * config/aarch64/aarch64.cc (aarch64_split_compare_and_swap):
> For 128-bit store the loaded value and loop if needed.
>
> libgcc/ChangeLog/
> PR target/111404
> * config/aarch64/lse.S (__aarch64_cas16_acq_rel): Execute STLXP using
> either new value or loaded value.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 5e8d0a0c91bc7719de2a8c5627b354cf905a4db0..c44c0b979d0cc3755c61dcf566cfddedccebf1ea
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23413,11 +23413,11 @@ aarch64_split_compare_and_swap (rtx operands[])
>mem = operands[1];
>oldval = operands[2];
>newval = operands[3];
> -  is_weak = (operands[4] != const0_rtx);
>model_rtx = operands[5];
>scratch = operands[7];
>mode = GET_MODE (mem);
>model = memmodel_from_int (INTVAL (model_rtx));
> +  is_weak = operands[4] != const0_rtx && mode != TImode;
>
>/* When OLDVAL is zero and we want the strong version we can emit a tighter
>  loop:
> @@ -23478,6 +23478,33 @@ aarch64_split_compare_and_swap (rtx operands[])
>else
>  aarch64_gen_compare_reg (NE, scratch, const0_rtx);
>
> +  /* 128-bit LDAXP is not atomic unless STLXP succeeds.  So for a mismatch,
> + store the returned value and loop if the STLXP fails.  */
> +  if (mode == TImode)
> +{
> +  rtx_code_label *label3 = gen_label_rtx ();
> +  emit_jump_insn (gen_rtx_SET (pc_rtx, gen_rtx_LABEL_REF (Pmode, 
> label3)));
> +  emit_barrier ();
> +
> +  emit_label (label2);
> +  aarch64_emit_store_exclusive (mode, scratch, mem, rval, model_rtx);
> +
> +  if (aarch64_track_speculation)
> +   {
> + /* Emit an explicit compare instruction, so that we can correctly
> +track the condition codes.  */
> + rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
> + x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
> +   }
> +  else
> +   x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
> +  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> +   gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
> +  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
> +
> +  label2 = label3;
> +}
> +
>emit_label (label2);
>
>/* If we used a CBNZ in the exchange loop emit an explicit compare with 
> RVAL
> diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
> index 
> dde3a28e07b13669533dfc5e8fac0a9a6ac33dbd..ba05047ff02b6fc5752235bffa924fc4a2f48c04
>  100644
> --- a/libgcc/config/aarch64/lse.S
> +++ b/libgcc/config/aarch64/lse.S
> @@ -160,6 +160,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #define tmp0   16
>  #define tmp1   17
>  #define tmp2   15
> +#define tmp3   14
> +#define tmp4   13
>
>  #define BTI_C  hint34
>
> @@ -233,10 +235,11 @@ STARTFN   NAME(cas)
>  0: LDXPx0, x1, [x4]
> cmp x0, x(tmp0)
> ccmpx1, x(tmp1), #0, eq
> -   bne 1f
> -   STXPw(tmp2), x2, x3, [x4]
> -   cbnzw(tmp2), 0b
> -1: BARRIER
> +   cselx(tmp2), x2, x0, eq
> +   cselx(tmp3), x3, x1, eq
> +   STXPw(tmp4), x(tmp2), x(tmp3), [x4]
> +   cbnzw(tmp4), 0b
> +   BARRIER
> ret
>
>  #endif
>


Re: [GCC 13 PATCH] aarch64: Remove architecture dependencies from intrinsics

2023-07-19 Thread Ramana Radhakrishnan
On Wed, Jul 19, 2023 at 5:44 PM Andrew Carlotti via Gcc-patches
 wrote:
>
> Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> OK to backport to GCC 13?
>
>
> Many intrinsics currently depend on both an architecture version and a
> feature, despite the corresponding instructions being available within
> GCC at lower architecture versions.
>
> LLVM has already removed these explicit architecture version
> dependences; this patch does the same for GCC. Note that +fp16 does not
> imply +simd, so we need to add an explicit +simd for the Neon fp16
> intrinsics.
>
> Binutils did not previously support all of these architecture+feature
> combinations, but this problem is already reachable from GCC.  For
> example, compiling the test gcc.target/aarch64/usadv16qi-dotprod.c
> with -O3 -march=armv8-a+dotprod has resulted in an assembler error since
> GCC 10.  This is fixed in Binutils 2.41.

Are there any implementations that actually implement v8-a + dotprod
?. As far as I'm aware this was v8.2-A as the base architecture where
this was allowed. Has this changed recently?


regards
Ramana


Re: [PATCH] aarch64: Add the scheduling model for Neoverse N1

2023-04-22 Thread Ramana Radhakrishnan
On Tue, Apr 18, 2023 at 10:42 PM Evandro Menezes via Gcc-patches
 wrote:
>
> This patch adds the scheduling model for Neoverse N1, based on the 
> information from the "Arm Neoverse N1 Software Optimization Guide”.

There haven't been many large core schedulers recently - so it will be
interesting to note and to see the experience.

We are probably missing any mention of the performance impact of these
patches - relative changes for a range of benchmarks are usually
specified in such submissions. I would strongly suggest seeing such
numbers. Prior experience over many generations has been that plugging
in numbers from a SWOG doesn't mean that the scheduler will perform
well.

Equally the patch has a number of really large blocking usage of
functional units in the scheduling descriptions especially for fp div
and fp sqrt instructions - these usually just bloat the automaton for
not enough gain in terms of actual performance in real world
measurements. I've noted a couple in a quick review below in the
patches.

A minor nit wearing my AArch32 hat is that -mcpu=neoverse-n1 is an
option for AArch32 thus technically the scheduling descriptions need
to be in gcc/config/arm but its possibly ok given the usage
statistics.

Reviewed-by: Ramana Radhakrishnan. 


regards
Ramana

>
> --
> Evandro Menezes
>
> 
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def: Use the Neoverse N1 scheduling 
> model.
> * config/aarch64/aarch64.md: Include `neoverse-n1.md`.
> * config/aarch64/neoverse-n1.md: New file.
>
> Signed-off-by: Evandro Menezes 
> ---
>  gcc/config/aarch64/aarch64-cores.def |   2 +-
>  gcc/config/aarch64/aarch64.md|   1 +
>  gcc/config/aarch64/neoverse-n1.md| 711 +++
>  3 files changed, 713 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/aarch64/neoverse-n1.md
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index e352e4077b1..cc842c4e22c 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -116,7 +116,7 @@ AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, 
> V8_2A,  (F16, RCPC, DOTPRO
>  AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
>  AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
>  AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> PROFILE), cortexa76, 0x41, 0xd0c, -1)
> -AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
> +AARCH64_CORE("neoverse-n1",  neoversen1, neoversen1, V8_2A,  (F16, RCPC, 
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
>  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
>
>  /* Cavium ('C') cores. */
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 022eef80bc1..6cb9e31259b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -471,6 +471,7 @@
>  (include "../arm/cortex-a57.md")
>  (include "../arm/exynos-m1.md")
>  (include "falkor.md")
> +(include "neoverse-n1.md")
>  (include "saphira.md")
>  (include "thunderx.md")
>  (include "../arm/xgene1.md")
> diff --git a/gcc/config/aarch64/neoverse-n1.md 
> b/gcc/config/aarch64/neoverse-n1.md
> new file mode 100644
> index 000..d66fa10c330
> --- /dev/null
> +++ b/gcc/config/aarch64/neoverse-n1.md
> @@ -0,0 +1,711 @@
> +;; Arm Neoverse N1 pipeline description
> +;; (Based on the "Arm Neoverse N1 Software Optimization Guide")
> +;;
> +;; Copyright (C) 2014-2023 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but
> +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;; General Public License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://

Re: [PATCH] arm: Implement arm Function target attribute 'branch-protection'

2023-04-22 Thread Ramana Radhakrishnan
On Fri, Jan 27, 2023 at 2:44 PM Andrea Corallo via Gcc-patches
 wrote:
>
> gcc/
>
> * config/arm/arm.cc (arm_valid_target_attribute_rec): Add ARM function
> attribute 'branch-protection' and parse its options.
> * doc/extend.texi: Document ARM Function attribute 
> 'branch-protection'.


Nit: s/ARM/arm or s/ARM/AArch32.

>
> gcc/testsuite/
>
> * gcc.target/arm/acle/pacbti-m-predef-13.c: New test.
>
> Co-Authored-By: Tejas Belagod  
> ---
>  gcc/config/arm/arm.cc | 16 
>  gcc/doc/extend.texi   |  7 
>  .../gcc.target/arm/acle/pacbti-m-predef-13.c  | 41 +++
>  3 files changed, 64 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c
>
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index efc48349dd3..add33090f18 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -33568,6 +33568,22 @@ arm_valid_target_attribute_rec (tree args, struct 
> gcc_options *opts)
>
>   opts->x_arm_arch_string = xstrndup (arch, strlen (arch));
> }
> +  else if (startswith (q, "branch-protection="))
> +   {
> + char *bp_str = q + strlen ("branch-protection=");
> +
> + opts->x_arm_branch_protection_string
> +   = xstrndup (bp_str, strlen (bp_str));
> +
> + /* Capture values from target attribute.  */
> + aarch_validate_mbranch_protection
> +   (opts->x_arm_branch_protection_string);
> +
> + /* Init function target attr values.  */
> + opts->x_aarch_ra_sign_scope = aarch_ra_sign_scope;
> + opts->x_aarch_enable_bti = aarch_enable_bti;
> +
> +   }
>else if (q[0] == '+')
> {
>   opts->x_arm_arch_string
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 4a89a3eae7c..23ee43919dd 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -4492,6 +4492,13 @@ Enable or disable calls to out-of-line helpers to 
> implement atomic operations.
>  This corresponds to the behavior of the command line options
>  @option{-moutline-atomics} and @option{-mno-outline-atomics}.
>
> +@item branch-protection=
> +@cindex @code{branch-protection=} function attribute, arm
> +Select the function scope on which branch protection will be applied.
> +The behavior and permissible arguments are the same as for the
> +command-line option @option{-mbranch-protection=}.  The default value
> +is @code{none}.
> +
>  @end table
>
>  The above target attributes can be specified as follows:
> diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c 
> b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c
> new file mode 100644
> index 000..b6d2df53072
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-13.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target mbranch_protection_ok } */
> +/* { dg-options "-march=armv8.1-m.main+fp -mbranch-protection=pac-ret+leaf 
> -mfloat-abi=hard --save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#if defined (__ARM_FEATURE_BTI_DEFAULT)
> +#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be undefined."
> +#endif
> +
> +#if !defined (__ARM_FEATURE_PAC_DEFAULT)
> +#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
> +#endif
> +
> +/*
> +**foo:
> +** bti
> +** ...
> +*/
> +__attribute__((target("branch-protection=pac-ret+bti"), noinline))
> +int foo ()
> +{
> +  return 3;
> +}
> +
> +/*
> +**main:
> +** pac ip, lr, sp
> +** ...
> +** aut ip, lr, sp
> +** bx  lr
> +*/
> +int
> +main()
> +{
> +  return 1 + foo ();
> +}
> +
> +/* { dg-final { scan-assembler "\.eabi_attribute 50, 1" } } */
> +/* { dg-final { scan-assembler "\.eabi_attribute 52, 1" } } */
> +/* { dg-final { scan-assembler-not "\.eabi_attribute 74" } } */
> +/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
> --
> 2.25.1
>


If the attribute covers all the options that are in the command line
documentation, surely there need to be tests for
__attribute__((target("branch-protection=standard")) ?


regards
Ramana


Re: [Patch Arm] Fix PR 92999

2022-11-24 Thread Ramana Radhakrishnan
Ping x 2

Ramana

On Thu, 17 Nov 2022, 20:15 Ramana Radhakrishnan, 
wrote:

> On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
> >  wrote:
> > >
> > > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > > >
> > > > >
> > > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > > >> PR92999 is a case where the VFP calling convention does not
> allocate
> > > > >> enough FP registers for a homogenous aggregate containing FP16
> values.
> > > > >> I believe this is the complete fix but would appreciate another
> set of
> > > > >> eyes on this.
> > > > >>
> > > > >> Could I get a hand with a regression test run on an armhf
> environment
> > > > >> while I fix my environment ?
> > > > >>
> > > > >> gcc/ChangeLog:
> > > > >>
> > > > >> PR target/92999
> > > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to
> handle
> > > > >> aggregates with elements smaller than SFmode.
> > > > >>
> > > > >> gcc/testsuite/ChangeLog:
> > > > >>
> > > > >> * gcc.target/arm/pr92999.c: New test.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> Ramana
> > > > >>
> > > > >> Signed-off-by: Ramana Radhakrishnan 
> > > > >
> > > > > I'm not sure about this.  The AAPCS does not mention a base type
> of a
> > > > > half-precision FP type as an appropriate homogeneous aggregate for
> using
> > > > > VFP registers for either calling or returning.
> > >
> > > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > > that's a good catch. BF16 should also have the same behaviour as FP16
> > > , I suspect ?
> >
> > I suspect I got caught out by the definition of the Homogenous
> > aggregate from Section 5.3.5
> > ((
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates
> )
> > which simply suggests it's an aggregate of fundamental types which
> > lists half precision floating point .
> >
> > FTR, ideally I should have read 7.1.2.1
> >
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling
> )
> > :)
> >
> >
> >
> > >
> > > > >
> > > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > > aggregate at all.
> > >
> > > Yep I agree - I'll take a look again tomorrow and see if I can get a
> fix.
> > >
> > > (And thanks Alex for the test run, I might trouble you again while I
> > > still (slowly) get some of my boards back up)
> >
> >
> > and as promised take 2. I'd really prefer another review on this one
> > to see if I've not missed anything in the cases below.
>
> Ping  ?
>
> Ramana
>
> >
> > regards
> > Ramana
> >
> >
> > >
> > > regards,
> > > Ramana
> > >
> > >
> > > >
> > > > R.
>


[Patch Arm] Add neon_fcmla and neon_fcadd as neon_type instructions.

2022-11-20 Thread Ramana Radhakrishnan via Gcc-patches
[AArch64 folks CC'd fyi as this is common between both backends.]

Hi,

The design in the backend used to be that advanced simd types are
generally added to is_neon_type in the backend. It appears that
neon_fcmla and neon_fcadd aren't added in as  neon_type instructions.

Applying this to the tree later this week after having built armhf and
a bootstrap and test run on aarch64-linux-gnu.

Thanks,
Ramana
commit 7dd15fae0ac1455f5818a1fc0078e35d85e1e250
Author: Ramana Radhakrishnan 
Date:   Wed Nov 16 10:32:04 2022 +

[Patch Arm] Add neon_fcadd and neon_fcmla to is_neon_type.

Appears to have been an oversight.

gcc/
* config/arm/types.md: Update comment.
(is_neon_type): Add neon_fcmla, neon_fcadd.

Signed-off-by: Ramana Radhakrishnan 

diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 7d0504bdd94..d0d9997efd2 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -248,7 +248,8 @@ (define_attr "autodetect_type"
 ; wmmx_wunpckil
 ; wmmx_wxor
 ;
-; The classification below is for NEON instructions.
+; The classification below is for NEON instructions. If a new neon type is
+; added, please ensure this is added to the is_neon_type attribute below too.
 ;
 ; neon_add
 ; neon_add_q
@@ -1281,6 +1282,7 @@ (define_attr "is_neon_type" "yes,no"
   neon_fp_mla_d_q, neon_fp_mla_d_scalar_q, neon_fp_sqrt_s,\
   neon_fp_sqrt_s_q, neon_fp_sqrt_d, neon_fp_sqrt_d_q,\
   neon_fp_div_s, neon_fp_div_s_q, neon_fp_div_d, neon_fp_div_d_q, 
crypto_aese,\
+  neon_fcadd, neon_fcmla, \
   crypto_aesmc, crypto_sha1_xor, crypto_sha1_fast, crypto_sha1_slow,\
   crypto_sha256_fast, crypto_sha256_slow")
 (const_string "yes")


Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-20 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches
 wrote:
>
>
>
> > -Original Message-
> > From: Andrea Corallo 
> > Sent: Thursday, November 17, 2022 4:38 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Richard Earnshaw
> > ; Stam Markianos-Wright  > wri...@arm.com>
> > Subject: [PATCH 15/35] arm: Explicitly specify other float types for 
> > _Generic
> > overloading [PR107515]
> >
> > From: Stam Markianos-Wright 
> >
> > This patch adds explicit references to other float types
> > to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
> >
> > gcc/ChangeLog:
> > PR 107515
> > * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
>
> Argh, I'm looking forward to when we move away from this _Generic business, 
> but for now ok.
> The ChangeLog should say "PR target/107515" for the git hook to recognize it 
> IIRC.

and the PR is against 11.x - is there a plan to back port this and
dependent patches to relevant branches ?

Ramana

> Thanks,
> Kyrill
>
> > ---
> >  gcc/config/arm/arm_mve.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> > index fd1876b57a0..f6b42dc3fab 100644
> > --- a/gcc/config/arm/arm_mve.h
> > +++ b/gcc/config/arm/arm_mve.h
> > @@ -35582,6 +35582,9 @@ enum {
> >   short: __ARM_mve_type_int_n, \
> >   int: __ARM_mve_type_int_n, \
> >   long: __ARM_mve_type_int_n, \
> > + _Float16: __ARM_mve_type_fp_n, \
> > + __fp16: __ARM_mve_type_fp_n, \
> > + float: __ARM_mve_type_fp_n, \
> >   double: __ARM_mve_type_fp_n, \
> >   long long: __ARM_mve_type_int_n, \
> >   unsigned char: __ARM_mve_type_int_n, \
> > --
> > 2.25.1
>


Re: [PATCH][GCC] arm: Add support for new frame unwinding instruction "0xb5".

2022-11-20 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 18, 2022 at 9:33 AM Srinath Parvathaneni
 wrote:
>
> Hi,
>
> > -Original Message-
> > From: Ramana Radhakrishnan 
> > Sent: Thursday, November 17, 2022 8:27 PM
> > To: Srinath Parvathaneni 
> > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> > ; Kyrylo Tkachov 
> > Subject: Re: [PATCH][GCC] arm: Add support for new frame unwinding
> > instruction "0xb5".
> >
> > On Thu, Nov 10, 2022 at 10:38 AM Srinath Parvathaneni via Gcc-patches  > patc...@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > This patch adds support for Arm frame unwinding instruction "0xb5"
> > > [1]. When an exception is taken and "0xb5" instruction is encounter
> > > during runtime stack-unwinding, we use effective vsp as modifier in 
> > > pointer
> > authentication.
> > > On completion of stack unwinding if "0xb5" instruction is not
> > > encountered then CFA will be used as modifier in pointer authentication.
> > >
> > > [1]
> > > https://github.com/ARM-software/abi-
> > aa/releases/download/2022Q3/ehabi3
> > > 2.pdf
> > >
> > > Regression tested on arm-none-eabi target and found no regressions.
> > >
> > > Ok for master?
> > >
> >
> > No, not yet.
> >
> > Presumably the logic to produce 0xb5 is in the source base and this was
> > tested with suitable options that produce said opcode ? I see no logic in 
> > place
> > to produce the said opcode in the backend in a quick read as the pacbti
> > patches still seem to be in review. ?
> >
> > So what was the test suite run actually testing ?
>
> Sorry for the late response, the patch supporting the said opcode (directive 
> ".pacspval)" is here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605524.html (still 
> under upstream review)
>
> and the patch to encode ".pacspval" with the mentioned opcode "0xb5" in 
> binutils is here:
> https://sourceware.org/pipermail/binutils/2022-November/124328.html (approved 
> and committed to binutils).

Thanks for the answer but perhaps I should make my question more
explicit - are you saying that this patch was tested in combination
with those and other dependent patches on a suitable simulator with
suitable multilibs and C++ to test for this presumably for frame
unwinding ?

For the future , it would certainly be worth being explicit about this
in your patch submission :)

regards
Ramana

>
> Regards,
> Srinath.
>
> > regards
> > Ramana
> >
> >
> > > Regards,
> > > Srinath.
> > >
> > > gcc/ChangeLog:
> > >
> > > 2022-11-09  Srinath Parvathaneni  
> > >
> > > * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
> > opcode
> > > "0xb5".
> > >
> > >
> > > ### Attachment also inlined for ease of reply
> > ###
> > >
> > >
> > > diff --git a/libgcc/config/arm/pr-support.c
> > > b/libgcc/config/arm/pr-support.c index
> > >
> > e48854587c667a959aa66ccc4982231f6ecc..73e4942a39b34a83c2da85de
> > f6b1
> > > 3e82ec501552 100644
> > > --- a/libgcc/config/arm/pr-support.c
> > > +++ b/libgcc/config/arm/pr-support.c
> > > @@ -107,7 +107,9 @@ __gnu_unwind_execute (_Unwind_Context *
> > context, __gnu_unwind_state * uws)
> > >_uw op;
> > >int set_pc;
> > >int set_pac = 0;
> > > +  int set_pac_sp = 0;
> > >_uw reg;
> > > +  _uw sp;
> > >
> > >set_pc = 0;
> > >for (;;)
> > > @@ -124,10 +126,11 @@ __gnu_unwind_execute (_Unwind_Context *
> > context,
> > > __gnu_unwind_state * uws)  #if defined(TARGET_HAVE_PACBTI)
> > >   if (set_pac)
> > > {
> > > - _uw sp;
> > >   _uw lr;
> > >   _uw pac;
> > > - _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> > _UVRSD_UINT32, );
> > > + if (!set_pac_sp)
> > > +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> > _UVRSD_UINT32,
> > > +);
> > >   _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32,
> > );
> > >   _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
> > >_UVRSD_UINT32, ); @@ -259,7 +262,19
> > > @@ __gnu_unwind_execute (_Unwind_Context * context,
> > __gnu_unwind_state * uws)
> > >   continue;
> > > }
> > >
> > > - if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
> > > + /* Use current VSP as modifier in PAC validation.  */
> > > + if (op == 0xb5)
> > > +   {
> > > + if (set_pac)
> > > +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> > _UVRSD_UINT32,
> > > +);
> > > + else
> > > +   return _URC_FAILURE;
> > > + set_pac_sp = 1;
> > > + continue;
> > > +   }
> > > +
> > > + if ((op & 0xfd) == 0xb6)  /* Obsolete FPA.  */
> > > return _URC_FAILURE;
> > >
> > >   /* op & 0xf8 == 0xb8.  */
> > >
> > >
> > >


Re: [PATCH] ARM: Make ARMv8-M attribute cmse_nonsecure_call work in Ada

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Oct 24, 2022 at 9:55 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> until most other machine attributes, this one does not work in Ada because,
> while it applies to pointer-to-function types, it is explicitly marked as
> requiring declarations in the implementation.
>
> Now, in Ada, machine attributes are specified like this:
>
>   type Non_Secure is access procedure;
>   pragma Machine_Attribute (Non_Secure, "cmse_nonsecure_call");
>
> i.e. not attached to the declaration of Non_Secure (testcase attached).
>
> So the attached patch extends the support to Ada by also accepting
> pointer-to-function types in the handler.
>
> Tested on arm-eabi, OK for the mainline?
>


Ok if no regressions, perhaps the test needs to be in the ada test suite ?

regards

Ramana


>
> 2022-10-24  Eric Botcazou  
>
> * config/arm/arm.cc (arm_attribute_table) : 
> Change
> decl_required field to false.
> (arm_handle_cmse_nonsecure_call): Deal with a TYPE node.
>
>
> --
> Eric Botcazou


Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Wilco Dijkstra  writes:
> > Hi Richard,
> >
> >> Can you go into more detail about:
> >>
> >>Use :option:`-mdirect-extern-access` either in shared libraries or in
> >>executables, but not in both.  Protected symbols used both in a shared
> >>library and executable may cause linker errors or fail to work correctly
> >>
> >> If this is LLVM's default for PIC (and by assumption shared libraries),
> >> is it then invalid to use -mdirect-extern-access for any PIEs that
> >> are linked against those shared libraries and use protected symbols
> >> from those libraries?  How would a user know that one of the shared
> >> libraries they're linking against was built in this way?
> >
> > Yes, the usage model is that you'd either use it for static PIE or only on
> > data that is not shared. If you get it wrong them you'll get the copy
> > relocation error.
>
> Thanks.  I think I'm still missing something though.  If, for the
> non-executable case, people should only use the feature on data that
> is not shared, why do we need to relax the binds-local condition for
> protected symbols on -fPIC?  Oughtn't the symbol to be hidden rather
> than protected if the data isn't shared?
>
> I can understand the reasoning for the PIE changes but I'm still
> struggling with the PIC-but-not-PIE bits.

I think I'm with Richard S on hidden vs protected on first reading. I
can see why this works out of the box and can even be default for
static-pie.

Any reason why this is not on by default - it's early enough in the
stage3 cycle and we can always flip the defaults if there are more
problems found.

You probably need a rebase for the documentation bits,.

regards
Ramana


Ramana


Re: [PATCH][GCC] arm: Add support for new frame unwinding instruction "0xb5".

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 10:38 AM Srinath Parvathaneni via Gcc-patches
 wrote:
>
> Hi,
>
> This patch adds support for Arm frame unwinding instruction "0xb5" [1]. When
> an exception is taken and "0xb5" instruction is encounter during runtime
> stack-unwinding, we use effective vsp as modifier in pointer authentication.
> On completion of stack unwinding if "0xb5" instruction is not encountered
> then CFA will be used as modifier in pointer authentication.
>
> [1] 
> https://github.com/ARM-software/abi-aa/releases/download/2022Q3/ehabi32.pdf
>
> Regression tested on arm-none-eabi target and found no regressions.
>
> Ok for master?
>

No, not yet.

Presumably the logic to produce 0xb5 is in the source base and this
was tested with suitable options that produce said opcode ? I see no
logic in place to produce the said opcode in the backend in a quick
read as the pacbti patches still seem to be in review. ?

So what was the test suite run actually testing ?

regards
Ramana


> Regards,
> Srinath.
>
> gcc/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
> * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode opcode
> "0xb5".
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
> index 
> e48854587c667a959aa66ccc4982231f6ecc..73e4942a39b34a83c2da85def6b13e82ec501552
>  100644
> --- a/libgcc/config/arm/pr-support.c
> +++ b/libgcc/config/arm/pr-support.c
> @@ -107,7 +107,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>_uw op;
>int set_pc;
>int set_pac = 0;
> +  int set_pac_sp = 0;
>_uw reg;
> +  _uw sp;
>
>set_pc = 0;
>for (;;)
> @@ -124,10 +126,11 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>  #if defined(TARGET_HAVE_PACBTI)
>   if (set_pac)
> {
> - _uw sp;
>   _uw lr;
>   _uw pac;
> - _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, 
> );
> + if (!set_pac_sp)
> +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32,
> +);
>   _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32, 
> );
>   _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
>_UVRSD_UINT32, );
> @@ -259,7 +262,19 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>   continue;
> }
>
> - if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
> + /* Use current VSP as modifier in PAC validation.  */
> + if (op == 0xb5)
> +   {
> + if (set_pac)
> +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32,
> +);
> + else
> +   return _URC_FAILURE;
> + set_pac_sp = 1;
> + continue;
> +   }
> +
> + if ((op & 0xfd) == 0xb6)  /* Obsolete FPA.  */
> return _URC_FAILURE;
>
>   /* op & 0xf8 == 0xb8.  */
>
>
>


Re: [Patch Arm] Fix PR 92999

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
 wrote:
>
> On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> >  wrote:
> > >
> > >
> > >
> > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > >
> > > >
> > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > >> PR92999 is a case where the VFP calling convention does not allocate
> > > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > > >> I believe this is the complete fix but would appreciate another set of
> > > >> eyes on this.
> > > >>
> > > >> Could I get a hand with a regression test run on an armhf environment
> > > >> while I fix my environment ?
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >> PR target/92999
> > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > > >> aggregates with elements smaller than SFmode.
> > > >>
> > > >> gcc/testsuite/ChangeLog:
> > > >>
> > > >> * gcc.target/arm/pr92999.c: New test.
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Ramana
> > > >>
> > > >> Signed-off-by: Ramana Radhakrishnan 
> > > >
> > > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > > VFP registers for either calling or returning.
> >
> > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > that's a good catch. BF16 should also have the same behaviour as FP16
> > , I suspect ?
>
> I suspect I got caught out by the definition of the Homogenous
> aggregate from Section 5.3.5
> ((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
> which simply suggests it's an aggregate of fundamental types which
> lists half precision floating point .
>
> FTR, ideally I should have read 7.1.2.1
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
> :)
>
>
>
> >
> > > >
> > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > aggregate at all.
> >
> > Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
> >
> > (And thanks Alex for the test run, I might trouble you again while I
> > still (slowly) get some of my boards back up)
>
>
> and as promised take 2. I'd really prefer another review on this one
> to see if I've not missed anything in the cases below.

Ping  ?

Ramana

>
> regards
> Ramana
>
>
> >
> > regards,
> > Ramana
> >
> >
> > >
> > > R.


Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2022-11-16 Thread Ramana Radhakrishnan via Gcc-patches


>
> OK with the comment typos fixed.
>
> > > > gcc/ChangeLog:
> > > >
> > > > 2018-11-11  Tamar Christina  
> > > >
> > > > * config/aarch64/aarch64-simd.md (aarch64_fcadd,
> > > > fcadd3, aarch64_fcmla,
> > > > fcmla4): New.
> > > > * config/aarch64/aarch64.h (TARGET_COMPLEX): New.
> > > > * config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
> > > > UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, 
> > > > UNSPEC_FCMLA270): New.
> > > > (FCADD, FCMLA): New.
> > > > (rot, rotsplit1, rotsplit2): New.
> > > > * config/arm/types.md (neon_fcadd, neon_fcmla): New.
>
> Should we push these to an existing class for now, and split them later when
> someone provides a scheduling model which makes use of them?

Sorry about a blast from the past :P . Perhaps these should just have
been added to is_neon_type that allows for the belts and braces
handling in cortex-a57.md which is the basis for many a scheduler
description in the backend :)

Ramana


Re: [Patch Arm] Fix PR 92999

2022-11-11 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
 wrote:
>
> On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
>  wrote:
> >
> >
> >
> > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > >
> > >
> > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > >> PR92999 is a case where the VFP calling convention does not allocate
> > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > >> I believe this is the complete fix but would appreciate another set of
> > >> eyes on this.
> > >>
> > >> Could I get a hand with a regression test run on an armhf environment
> > >> while I fix my environment ?
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >> PR target/92999
> > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > >> aggregates with elements smaller than SFmode.
> > >>
> > >> gcc/testsuite/ChangeLog:
> > >>
> > >> * gcc.target/arm/pr92999.c: New test.
> > >>
> > >>
> > >> Thanks,
> > >> Ramana
> > >>
> > >> Signed-off-by: Ramana Radhakrishnan 
> > >
> > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > VFP registers for either calling or returning.
>
> Ooh interesting, thanks for taking a look and poking at the AAPCS and
> that's a good catch. BF16 should also have the same behaviour as FP16
> , I suspect ?

I suspect I got caught out by the definition of the Homogenous
aggregate from Section 5.3.5
((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
which simply suggests it's an aggregate of fundamental types which
lists half precision floating point .

FTR, ideally I should have read 7.1.2.1
https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
:)



>
> > >
> > > So perhaps the bug is that we try to treat this as a homogeneous
> > > aggregate at all.
>
> Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
>
> (And thanks Alex for the test run, I might trouble you again while I
> still (slowly) get some of my boards back up)


and as promised take 2. I'd really prefer another review on this one
to see if I've not missed anything in the cases below.

regards
Ramana


>
> regards,
> Ramana
>
>
> >
> > R.
commit c2ed018d10328c5cf93aa56b00ba4caf5dace539
Author: Ramana Radhakrishnan 
Date:   Fri Nov 11 21:39:22 2022 +

[Patch Arm] Fix PR92999

PR target/92999 is a case where the VFP PCS implementation is
incorrectly considering homogenous floating point aggregates with FP16
and BF16 values.

Can someone help me with a bootstrap and regression test on an armhf
environment ?

Signed-off-by: Ramana Radhakrishnan 
Tested-by: Alex Coplan  
Reviewed-by: Richard Earnshaw  

PR target/92999

gcc/ChangeLog:

* config/arm/arm.cc (aapcs_vfp_is_invalid_scalar_in_ha): New
(aapcs_vfp_sub_candidate): Adjust.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..cd3e1ffe777 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6281,6 +6281,31 @@ const unsigned int WARN_PSABI_EMPTY_CXX17_BASE = 1U << 0;
 const unsigned int WARN_PSABI_NO_UNIQUE_ADDRESS = 1U << 1;
 const unsigned int WARN_PSABI_ZERO_WIDTH_BITFIELD = 1U << 2;
 
+
+/* The AAPCS VFP ABI allows homogenous aggregates with scalar
+   FP32 and FP64 members.
+   Return
+ true if this is a scalar that is not a proper candidate
+ false if this is a scalar that is an acceptable scalar data
+ type in a homogenous aggregate or if this is not a scalar allowing
+ the tree walk in aapcs_vfp_sub_candidate to continue.
+ */
+static bool
+aapcs_vfp_is_invalid_scalar_in_ha (const_tree inner_type)
+{
+
+  machine_mode mode = TYPE_MODE (inner_type);
+  if (TREE_CODE (inner_type) == REAL_TYPE)
+{
+  if (mode == DFmode && mode == SFmode)
+   return false;
+  else
+   return true;
+}
+  else
+return false;
+}
+
 /* Walk down the type tree of TYPE counting consecutive base elements.
If *MODEP is VOIDmode, then set it to the first valid floating point
type.  If a non-floating point type is found, or if a floating point
@@ -6372,6 +6397,10 @@ aapcs_vfp_sub_candidate (const_tree ty

Re: [PATCH][GCC] arm: Add support for Cortex-X1C CPU.

2022-11-10 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 10:24 AM Srinath Parvathaneni via Gcc-patches
 wrote:
>
> Hi,
>
> This patch adds the -mcpu support for the Arm Cortex-X1C CPU.
>
> Regression tested on arm-none-eabi and bootstrapped on 
> arm-none-linux-gnueabihf.
>
> Ok for GCC master?


Ok
Ramana
>
> Regards,
> Srinath.
>
> gcc/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
>* config/arm/arm-cpus.in (cortex-x1c): Define new CPU.
>* config/arm/arm-tables.opt: Regenerate.
>* config/arm/arm-tune.md: Likewise.
>* 
> doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst:
>Document Cortex-X1C CPU.
>
>gcc/testsuite/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
>* gcc.target/arm/multilib.exp: Add tests for Cortex-X1C.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
> 5a63bc548e54dbfdce5d1df425bd615d81895d80..5ed4db340bc5d7c9a41e6d1a3f660bf2a97b058b
>  100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -1542,6 +1542,17 @@ begin cpu cortex-x1
>   part d44
>  end cpu cortex-x1
>
> +begin cpu cortex-x1c
> + cname cortexx1c
> + tune for cortex-a57
> + tune flags LDSCHED
> + architecture armv8.2-a+fp16+dotprod
> + option crypto add FP_ARMv8 CRYPTO
> + costs cortex_a57
> + vendor 41
> + part d4c
> +end cpu cortex-x1c
> +
>  begin cpu neoverse-n1
>   cname neoversen1
>   alias !ares
> diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
> index 
> e6461abcc57cd485025f3e18535267c454662cbe..a10a09e36cd004165b6f1efddeb3bfc29d8337ac
>  100644
> --- a/gcc/config/arm/arm-tables.opt
> +++ b/gcc/config/arm/arm-tables.opt
> @@ -255,6 +255,9 @@ Enum(processor_type) String(cortex-a710) Value( 
> TARGET_CPU_cortexa710)
>  EnumValue
>  Enum(processor_type) String(cortex-x1) Value( TARGET_CPU_cortexx1)
>
> +EnumValue
> +Enum(processor_type) String(cortex-x1c) Value( TARGET_CPU_cortexx1c)
> +
>  EnumValue
>  Enum(processor_type) String(neoverse-n1) Value( TARGET_CPU_neoversen1)
>
> diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
> index 
> abc290edd094179379f3856a3f8f64781e0c33f2..8af8c936abe31fb60e3de2fd713f4c6946c2a752
>  100644
> --- a/gcc/config/arm/arm-tune.md
> +++ b/gcc/config/arm/arm-tune.md
> @@ -46,7 +46,7 @@
> cortexa73cortexa53,cortexa55,cortexa75,
> cortexa76,cortexa76ae,cortexa77,
> cortexa78,cortexa78ae,cortexa78c,
> -   cortexa710,cortexx1,neoversen1,
> +   cortexa710,cortexx1,cortexx1c,neoversen1,
> cortexa75cortexa55,cortexa76cortexa55,neoversev1,
> neoversen2,cortexm23,cortexm33,
> cortexm35p,cortexm55,starmc1,
> diff --git 
> a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst 
> b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst
> index 
> 3315114969381995d47162b53abeb9bfc442fd28..d531eced20cbb583ecaba2ab3927937faf69b9de
>  100644
> --- 
> a/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst
> +++ 
> b/gcc/doc/gcc/gcc-command-options/machine-dependent-options/arm-options.rst
> @@ -594,7 +594,7 @@ These :samp:`-m` options are defined for the ARM port:
>:samp:`cortex-r7`, :samp:`cortex-r8`, :samp:`cortex-r52`, 
> :samp:`cortex-r52plus`,
>:samp:`cortex-m0`, :samp:`cortex-m0plus`, :samp:`cortex-m1`, 
> :samp:`cortex-m3`,
>:samp:`cortex-m4`, :samp:`cortex-m7`, :samp:`cortex-m23`, 
> :samp:`cortex-m33`,
> -  :samp:`cortex-m35p`, :samp:`cortex-m55`, :samp:`cortex-x1`,
> +  :samp:`cortex-m35p`, :samp:`cortex-m55`, :samp:`cortex-x1`, 
> :samp:`cortex-x1c`,
>:samp:`cortex-m1.small-multiply`, :samp:`cortex-m0.small-multiply`,
>:samp:`cortex-m0plus.small-multiply`, :samp:`exynos-m1`, 
> :samp:`marvell-pj4`,
>:samp:`neoverse-n1`, :samp:`neoverse-n2`, :samp:`neoverse-v1`, 
> :samp:`xscale`,
> diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
> b/gcc/testsuite/gcc.target/arm/multilib.exp
> index 
> 2fa648c61dafebb663969198bf7849400a7547f6..f903f028a83f884bdc1521f810f7e70e4130a715
>  100644
> --- a/gcc/testsuite/gcc.target/arm/multilib.exp
> +++ b/gcc/testsuite/gcc.target/arm/multilib.exp
> @@ -450,6 +450,9 @@ if {[multilib_config "aprofile"] } {
> {-march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -mthumb} 
> "thumb/v8-a+simd/hard"
> {-march=armv7-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp 
> -mthumb} "thumb/v7-a+simd/softfp"
> {-march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=softfp 
> -mthumb} "thumb/v8-a+simd/softfp"
> +   {-mcpu=cortex-x1c -mfpu=auto -mfloat-abi=softfp -mthumb} 
> "thumb/v8-a+simd/softfp"
> +   {-mcpu=cortex-x1c -mfpu=auto -mfloat-abi=hard -mthumb} 
> "thumb/v8-a+simd/hard"
> +   {-mcpu=cortex-x1c -mfpu=auto -mfloat-abi=soft -mthumb} 
> "thumb/v8-a/nofp"
>  } {
> check_multi_dir $opts $dir
>  }
>
>
>


Re: [Patch Arm] Fix PR 92999

2022-11-10 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
 wrote:
>
>
>
> On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> >
> >
> > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> >> PR92999 is a case where the VFP calling convention does not allocate
> >> enough FP registers for a homogenous aggregate containing FP16 values.
> >> I believe this is the complete fix but would appreciate another set of
> >> eyes on this.
> >>
> >> Could I get a hand with a regression test run on an armhf environment
> >> while I fix my environment ?
> >>
> >> gcc/ChangeLog:
> >>
> >> PR target/92999
> >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> >> aggregates with elements smaller than SFmode.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/arm/pr92999.c: New test.
> >>
> >>
> >> Thanks,
> >> Ramana
> >>
> >> Signed-off-by: Ramana Radhakrishnan 
> >
> > I'm not sure about this.  The AAPCS does not mention a base type of a
> > half-precision FP type as an appropriate homogeneous aggregate for using
> > VFP registers for either calling or returning.

Ooh interesting, thanks for taking a look and poking at the AAPCS and
that's a good catch. BF16 should also have the same behaviour as FP16
, I suspect ?

> >
> > So perhaps the bug is that we try to treat this as a homogeneous
> > aggregate at all.

Yep I agree - I'll take a look again tomorrow and see if I can get a fix.

(And thanks Alex for the test run, I might trouble you again while I
still (slowly) get some of my boards back up)

regards,
Ramana


>
> R.


[Patch Arm] Fix PR 92999

2022-11-08 Thread Ramana Radhakrishnan via Gcc-patches
PR92999 is a case where the VFP calling convention does not allocate
enough FP registers for a homogenous aggregate containing FP16 values.
I believe this is the complete fix but would appreciate another set of
eyes on this.

Could I get a hand with a regression test run on an armhf environment
while I fix my environment ?

gcc/ChangeLog:

PR target/92999
*  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
aggregates with elements smaller than SFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.


Thanks,
Ramana

Signed-off-by: Ramana Radhakrishnan 
---
 gcc/config/arm/arm.cc  |  6 -
 gcc/testsuite/gcc.target/arm/pr92999.c | 31 ++
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr92999.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 2eb4d51e4a3..03f4057f717 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -6740,7 +6740,11 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant 
ATTRIBUTE_UNUSED,
  count *= 2;
}
}
-  shift = GET_MODE_SIZE(ag_mode) / GET_MODE_SIZE(SFmode);
+
+  /* Aggregates can contain FP16 or BF16 values which would need to
+be passed in via FP registers.  */
+  shift = (MAX(GET_MODE_SIZE(ag_mode), GET_MODE_SIZE(SFmode))
+  / GET_MODE_SIZE(SFmode));
   par = gen_rtx_PARALLEL (mode, rtvec_alloc (count));
   for (i = 0; i < count; i++)
{
diff --git a/gcc/testsuite/gcc.target/arm/pr92999.c 
b/gcc/testsuite/gcc.target/arm/pr92999.c
new file mode 100644
index 000..faa21fdb7d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr92999.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-mfp16-format=ieee" } */
+
+//
+// Compile with gcc -mfp16-format=ieee
+// Any optimization level is fine.
+//
+// Correct output should be
+// "y.first = 1, y.second = -99"
+//
+// Buggy output is
+// "y.first = -99, y.second = -99"
+//
+#include 
+struct phalf {
+__fp16 first;
+__fp16 second;
+};
+
+struct phalf phalf_copy(struct phalf* src) __attribute__((noinline));
+struct phalf phalf_copy(struct phalf* src) {
+return *src;
+}
+
+int main() {
+struct phalf x = { 1.0, -99.0};
+struct phalf y = phalf_copy();
+if (y.first != 1.0 && y.second != -99.0)
+   abort();
+return 0;
+}
-- 
2.34.1


Update Affiliation.

2022-11-04 Thread Ramana Radhakrishnan via Gcc-patches
Update affiliation as I'm moving on from Arm. More from me in a month or so.

Pushed to trunk

regards
Ramana
commit 4e5f373406ed5da42c4a7c4b7f650d92195f2984
Author: Ramana Radhakrishnan 
Date:   Fri Nov 4 09:30:00 2022 +

Update affiliation

diff --git a/htdocs/steering.html b/htdocs/steering.html
index 3a38346d..95d6a4a8 100644
--- a/htdocs/steering.html
+++ b/htdocs/steering.html
@@ -38,7 +38,7 @@ place to reach them is the gcc mailing 
list.
 Toon Moene (Koninklijk Nederlands Meteorologisch Instituut)
 Joseph Myers (CodeSourcery / Mentor Graphics) [co-Release Manager]
 Gerald Pfeifer (SUSE)
-Ramana Radhakrishnan (ARM)
+Ramana Radhakrishnan 
 Joel Sherrill (OAR Corporation)
 Ian Lance Taylor (Google)
 Jim Wilson (SiFive)


Update email address

2022-10-31 Thread Ramana Radhakrishnan via Gcc-patches
As $subject. Pushed to trunk.

Regards,
Ramana


diff --git a/MAINTAINERS b/MAINTAINERS
index e4e7349a6d9..55c5ef95806 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -60,7 +60,7 @@ arc port  Joern Rennecke  

 arc port   Claudiu Zissulescu  
 arm port   Nick Clifton
 arm port   Richard Earnshaw
-arm port   Ramana Radhakrishnan
+arm port   Ramana Radhakrishnan
 arm port   Kyrylo Tkachov  
 avr port   Denis Chertykov 
 bfin port  Jie Zhang   


Re: [PATCH V2] arm: add -static-pie support

2022-08-10 Thread Ramana Radhakrishnan via Gcc-patches
Hi Lance,

Thanks for your contribution - looks like your first one to GCC ?

The patch looks good to me, though it should probably go through a
full test suite run on arm-linux-gnueabihf and get a ChangeLog - see
here for more https://gcc.gnu.org/contribute.html#patches.

This is probably small enough to go under the 10 line rule but since
you've used Signed-off-by in your patch, is that indicating you are
contributing under DCO rules -
https://gcc.gnu.org/contribute.html#legal ?

regards
Ramana


On Thu, Aug 4, 2022 at 5:48 PM Lance Fredrickson via Gcc-patches
 wrote:
>
> Just a follow up trying to get some eyes on my static-pie patch
> submission for arm.
> Feedback welcome.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598610.html
>
> thanks,
> Lance Fredrickson


Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-11-17 Thread Ramana Radhakrishnan via Gcc-patches
Thanks Ard and Qing.

I have been busy with other things in the last few weeks and I don’t work on 
GCC as part of my day job : however I’ll try to find some time to review this 
patch set in the coming days.

Sorry about the delay.

Regards,
Ramana

From: Ard Biesheuvel 
Date: Tuesday, 9 November 2021 at 22:03
To: Qing Zhao 
Cc: Ramana Radhakrishnan , 
linux-harden...@vger.kernel.org , kees Cook 
, Keith Packard , 
thomas.preudho...@celest.fr , 
adhemerval.zane...@linaro.org , Richard 
Sandiford , gcc-patches@gcc.gnu.org 

Subject: Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack 
protector canary access
On Tue, 9 Nov 2021 at 21:45, Qing Zhao  wrote:
>
> Hi, Ard,
>
> Sorry for the late reply (since I don’t have the right to approve a patch, I 
> has been waiting for any arm port maintainer to review this patch).
> The following is the arm port maintainer information I got from MAINTAINERS 
> file (you might want to explicitly cc’ing one of them for a review)
>
> arm portNick Clifton
> arm portRichard Earnshaw    
> arm port    Ramana Radhakrishnan
> arm portKyrylo Tkachov  
>
> I see that Ramana implemented the similar patch for aarch64 (commit 
> cd0b2d361df82c848dc7e1c3078651bb0624c3c6), So, I am CCing him with this 
> email. Hopefully he will review this patch.
>

Thank you Qing. But I know Ramana well, and I know he no longer works
on GCC. I collaborated with him on the AArch64 implementation at the
time (but he wrote all the code)

> Anyway, I briefly read your patch (version 4), and have the following 
> questions and comments:
>
> 1.  When the option -mstack-protector-guard=tls presents,  should the option 
> mstack-protector-guard-offset=.. be required to present?
>  If it’s required to present, you might want to add such requirement to 
> the documentation, and also issue errors when it’s not present.
>  It’s not clear right now from the current implementation, so, you might 
> need to update both "arm_option_override_internal “ in arm.c
>  and doc/invoke.texi to make this clear.
>

An  offset of 0x0 is a reasonable default, so I don't think it is
necessary to require the offset param to be passed in that case.

> 2. For arm, is there only one system register can be used for this purpose?
>

There are other registers that might be used in the same way, but the
TLS register is the obvious choice. On AArch64, we decided to use
'sysreg' and permit the user to specify the register because the Linux
kernel uses the user space stack pointer (SP_EL0), which is kind of
odd so we did not want to hard code that.

> 3. For the functionality you added, I didn’t see any testing cases added, I 
> think testing cases are needed.
>

Yes, I am aware of that. I'm just not sure I know how to proceed here:
any pointers?

> More comments are embedded below:
>
> > On Oct 28, 2021, at 6:27 AM, Ard Biesheuvel  wrote:
> >
> > Add support for accessing the stack canary value via the TLS register,
> > so that multiple threads running in the same address space can use
> > distinct canary values. This is intended for the Linux kernel running in
> > SMP mode, where processes entering the kernel are essentially threads
> > running the same program concurrently: using a global variable for the
> > canary in that context is problematic because it can never be rotated,
> > and so the OS is forced to use the same value as long as it remains up.
> >
> > Using the TLS register to index the stack canary helps with this, as it
> > allows each CPU to context switch the TLS register along with the rest
> > of the process, permitting each process to use its own value for the
> > stack canary.
> >
> > 2021-10-28 Ard Biesheuvel 
> >
> >   * config/arm/arm-opts.h (enum stack_protector_guard): New
> >   * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
> >   New
> >   * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
> >   (arm_option_override_internal): Handle and put in error checks
> >   for stack protector guard options.
> >   (arm_option_reconfigure_globals): Likewise
> >   (arm_stack_protect_tls_canary_mem): New
> >   (arm_stack_protect_guard): New
> >   * config/arm/arm.md (stack_protect_set): New
> >   (stack_protect_set_tls): Likewise
> >   (stack_protect_test): Likewise
> >   (stack_protect_test_tls): Likewise
> >   (reload_tp_hard): Likewise
> >   * config/arm/arm.opt (-mstack-protector-guard): New
> >   (-mstack-protector-guard-offset): New.
> >   * doc/invoke.texi: Document new options
> >
> > Signed-off-b

Re: [PATCH][GCC] arm: add armv9-a architecture to -march

2021-11-16 Thread Ramana Radhakrishnan via Gcc-patches
Hi There,

I think for AArch32 mapping it back to armv8-a sounds sufficient.  Unless we 
have string or math routines in newlib that make use of any ACLE guards that 
are beyond armv8-a …

Ramana


From: Richard Earnshaw 
Date: Tuesday, 16 November 2021 at 11:48
To: Christophe Lyon , Przemyslaw Wirkus 

Cc: Ramana Radhakrishnan , 
gcc-patches@gcc.gnu.org , Richard Earnshaw 

Subject: Re: [PATCH][GCC] arm: add armv9-a architecture to -march
You can't make an omelette without breaking eggs, as they say.  New
architectures need new assemblers.

However, I wonder if there's anything in v9-a that significantly affects
the quality of the base multilib code needed for building the libraries.
  It might be that we can deal with v9-a by just mapping it to the v8-a
equivalents.  That would then avoid the need for an updated assembler,
and reduce the build time and install footprint.

R.


On 16/11/2021 08:03, Christophe Lyon via Gcc-patches wrote:
> Hi,
>
>
> On Tue, Nov 9, 2021 at 12:36 PM Przemyslaw Wirkus via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
>>>>> -Original Message-
>>>>> From: Przemyslaw Wirkus
>>>>> Sent: 18 October 2021 10:37
>>>>> To: gcc-patches@gcc.gnu.org
>>>>> Cc: Richard Earnshaw ; Ramana
>>>>> Radhakrishnan ; Kyrylo Tkachov
>>>>> ; ni...@redhat.com
>>>>> Subject: [PATCH][GCC] arm: add armv9-a architecture to -march
>>>>>
>>>>> Hi,
>>>>>
>>>>> This patch is adding `armv9-a` to -march in Arm GCC.
>>>>>
>>>>> In this patch:
>>>>>+ Add `armv9-a` to -march.
>>>>>+ Update multilib with armv9-a and armv9-a+simd.
>>>>>
>>>>> After this patch three additional multilib directories are available:
>>>>>
>>>>> $ arm-none-eabi-gcc --print-multi-lib .; [...vanilla multi-lib
>>>>> dirs...] thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft
>>>>> thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat-
>>>>> abi=softfp
>>>>> thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat-
>>>>> abi=hard
>>>>>
>>
>
> This is causing a GCC build failure when using "old" binutils (I'm using
> 2.36.1),
> because the new -march=armv9-a option is not supported. This breaks the
> multilib support.
>
> I don't remember how we handled similar cases in the past? Is that just
> "expected", and
> "current" GCC needs "current" binutils, or should we have a multilib list
> dependent on
> the actual binutils support? (I think this is not the case, and it sounds
> like an undesirable
> extra complication in an already overcrowded mutilib-Makefile)
>
> Christophe
>
>>>> New multi-lib directories under
>>>>> $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created:
>>>>>
>>>>> thumb/
>>>>> +--- v9-a
>>>>> ||--- nofp
>>>>> |
>>>>> +--- v9-a+simd
>>>>>   |--- hard
>>>>>   |--- softfp
>>>>>
>>>>> Regtested on arm-none-eabi cross and no issues.
>>>>>
>>>>> OK for master?
>>
>> Thanks.
>>
>> commit 32ba7860ccaddd5219e6dae94a3d0653e124c9dd
>>
>>> Ok.
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>* config/arm/arm-cpus.in (armv9): New define.
>>>>>(ARMv9a): New group.
>>>>>(armv9-a): New arch definition.
>>>>>* config/arm/arm-tables.opt: Regenerate.
>>>>>* config/arm/arm.h (BASE_ARCH_9A): New arch enum value.
>>>>>* config/arm/t-aprofile: Added armv9-a and armv9+simd.
>>>>>* config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs
>>>>>to MULTILIB_MATCHES.
>>>>>* config/arm/t-multilib: Added v9_a_nosimd_variants and
>>>>>v9_a_simd_variants to MULTILIB_MATCHES.
>>>>>* doc/invoke.texi: Update docs.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>* gcc.target/arm/multilib.exp: Update test with armv9-a entries.
>>>>>* lib/target-supports.exp (v9a): Add new armflag.
>>>>>(__ARM_ARCH_9A__): Add new armdef.
>>>>>
>>>>> --
>>>>> kind regards,
>>>>> Przemyslaw Wirkus
>>
>>


Re: [PATCH][GCC] arm: Enable Cortex-R52+ CPU

2021-09-22 Thread Ramana Radhakrishnan via Gcc-patches
This is OK

Ramana

On 22/09/2021, 09:45, "Przemyslaw Wirkus"  wrote:

Patch is adding Cortex-R52+ as 'cortex-r52plus' command line
flag for -mcpu option.

See: https://www.arm.com/products/silicon-ip-cpu/cortex-r/cortex-r52-plus

OK for master?

gcc/ChangeLog:

2021-09-22  Przemyslaw Wirkus  

* config/arm/arm-cpus.in: Add Cortex-R52+ CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.



Re: [PATCH] arm: Fix ICE on glibc compilation after my DIVMOD optimization [PR97322]

2020-10-08 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Oct 8, 2020 at 10:22 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The arm target hook for divmod wasn't prepared to handle constants passed to
> the function.
>
> Fixed thusly, bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?
>
> 2020-10-08  Jakub Jelinek  
>
> PR target/97322
> * config/arm/arm.c (arm_expand_divmod_libfunc): Pass mode instead of
> GET_MODE (op0) or GET_MODE (op1) to emit_library_call_value.
>
> * gcc.dg/pr97322.c: New test.

Ok.

Ramana

>
> --- gcc/config/arm/arm.c.jj 2020-10-07 10:47:46.892985596 +0200
> +++ gcc/config/arm/arm.c2020-10-07 20:19:25.524367665 +0200
> @@ -33275,9 +33275,7 @@ arm_expand_divmod_libfunc (rtx libfunc,
>  = smallest_int_mode_for_size (2 * GET_MODE_BITSIZE (mode));
>
>rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
> -   libval_mode,
> -   op0, GET_MODE (op0),
> -   op1, GET_MODE (op1));
> +   libval_mode, op0, mode, op1, mode);
>
>rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
>rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
> --- gcc/testsuite/gcc.dg/pr97322.c.jj   2020-10-07 20:19:54.071961807 +0200
> +++ gcc/testsuite/gcc.dg/pr97322.c  2020-10-07 20:19:16.897490309 +0200
> @@ -0,0 +1,17 @@
> +/* PR target/97322 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void
> +foo (unsigned long long x, unsigned long long *y)
> +{
> +  y[0] = x / 10;
> +  y[1] = x % 10;
> +}
> +
> +void
> +bar (unsigned int x, unsigned int *y)
> +{
> +  y[0] = x / 10;
> +  y[1] = x % 10;
> +}
>
> Jakub
>


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-09-03 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Sep 3, 2020 at 6:13 PM Kees Cook via Gcc-patches
 wrote:
>
> On Thu, Sep 03, 2020 at 09:29:54AM -0500, Qing Zhao wrote:
> > On average, all the options starting with “used_…”  (i.e, only the 
> > registers that are used in the routine will be zeroed) have very low 
> > runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP 
> > benchmarks.
> > If all the registers will be zeroed, the runtime overhead is bigger, 
> > all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks 
> > on average.
> > Looks like the overhead of zeroing vector registers is much bigger.
> >
> > For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, 
> > the runtime overhead with this is very small.
>
> That looks great; thanks for doing those tests!
>
> (And it seems like these benchmarks are kind of a "worst case" scenario
> with regard to performance, yes? As in it's mostly tight call loops?)


That's true of some of them but definitely not all - the GCC benchmark
springs to mind in SPEC as having quite a flat profile, so I'd take a
look there and probe a bit more in that one to see what happens. Don't
ask me what else , that's all I have in my cache this evening :)

I'd also query the "average" slowdown metric in those numbers as
something that's being measured in a different way here. IIRC the SPEC
scores for int and FP are computed with a geometric mean of the
individual ratios of each of the benchmark. Thus I don't think the
average of the slowdowns is enough to talk about slowdowns for the
benchmark suite. A quick calculation of the arithmetic mean of column
B in my head suggests that it's the arithmetic mean of all the
slowdowns ?

i.e. Slowdown (Geometric Mean (x, y, z, ))  != Arithmetic mean (
Slowdown (x), Slowdown (y) .)

So another metric to look at would be to look at the Slowdown of your
estimated (probably non-reportable) SPEC scores as well to get a more
"spec like" metric.

regards
Ramana
>
> --
> Kees Cook


Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-28 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Aug 26, 2020 at 12:08 PM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Aug 25, 2020 at 6:32 PM Maciej W. Rozycki  wrote:
> >
> > Hi Kito,
> >
> > > I just found the mail thread about div mod with -fnon-call-exceptions,
> > > I think keeping the default LIB2_DIVMOD_EXCEPTION_FLAGS unchanged
> > > should be the best way to go.
> > >
> > > Non-call exceptions and libcalls
> > > https://gcc.gnu.org/legacy-ml/gcc/2001-06/msg01108.html
> > >
> > > Non-call exceptions and libcalls Part 2
> > > https://gcc.gnu.org/legacy-ml/gcc/2001-07/msg00402.html
> >
> >  Thank you for your input.  I believe I had a look at these commits before
> > I posted my original proposal.  Please note however that they both predate
> > the addition of `-fasynchronous-unwind-tables', so clearly the option
> > could not have been considered at the time the changes were accepted into
> > GCC.
> >
> >  Please note that, as observed by Andreas and Richard here:
> >  in no case we
> > want to have full exception handling here, so we clearly need no
> > `-fexceptions'; this libcall code won't itself ever call `throw'.
> >
> >  Now it might be a bit unclear from documentation as to whether we want
> > `-fnon-call-exceptions' or `-fasynchronous-unwind-tables', as it says that
> > the former option makes GCC:
> >
> > "Generate code that allows trapping instructions to throw
> >  exceptions.  Note that this requires platform-specific runtime
> >  support that does not exist everywhere.  Moreover, it only allows
> >  _trapping_ instructions to throw exceptions, i.e. memory references
> >  or floating-point instructions.  It does not allow exceptions to be
> >  thrown from arbitrary signal handlers such as 'SIGALRM'."
> >
> > Note the observation that arbitrary signal handlers (invoked at more inner
> > a frame level, and necessarily built with `-fexceptions') are still not
> > allowed to throw exceptions.  For that, as far as I understand it, you
> > actually need `-fasynchronous-unwind-tables', which makes GCC:
> >
> > "Generate unwind table in DWARF format, if supported by target
> >  machine.  The table is exact at each instruction boundary, so it
> >  can be used for stack unwinding from asynchronous events (such as
> >  debugger or garbage collector)."
> >
> > and therefore allows arbitrary signal handlers to throw exceptions,
> > effectively making the option a superset of `-fexceptions'.  As libcall
> > code can generally be implicitly invoked everywhere, we want people not to
> > be restrained by it and let a exception thrown by e.g. a user-supplied
> > SIGALRM handler propagate through the relevant libcall's stack frame,
> > rather than just those exceptions the libcall itself might indirectly
> > cause.
> >
> >  Maybe I am missing something here, especially as `-fexceptions' mentions
> > code generation, while `-fasynchronous-unwind-tables' only refers to
> > unwind table generation, but then what would be the option to allow
> > exceptions to be thrown from arbitrary signal handlers rather than those
> > for memory references or floating-point instructions (where by a special
> > provision integer division falls as well)?
> >
> >  My understanding has been it is `-fasynchronous-unwind-tables', but I'll
> > be gladly straightened out otherwise.  If I am indeed right, then perhaps
> > the documentation could be clarified and expanded a bit.
> >
> >  Barring evidence to the contrary I maintain the change I have proposed is
> > correct, and not only removes the RISC-V `ld.so' build issue, but it fixes
> > the handling of asynchronous events arriving in the middle of the relevant
> > libcalls for all platforms as well.
> >
> >  Please let me know if you have any further questions, comments or
> > concerns.
>
> You only need -fexceptions for that, then you can throw; from a signal handler
> for example.  If you want to be able to catch the exception somewhere up
> the call chain all intermediate code needs to be compiled so that unwinding
> from asynchronous events is possible - -fasynchronous-unwind-tables.
>
> So -fasynchronous-unwind-tables is about unwinding.  -f[non-call]-exceptions
> is about throw/catch.  Clearly libgcc does neither throw nor catch but with
> async events we might need to unwind from inside it.
>
> Now I don't know about the arm situation but if arm cannot do async unwinding
> then even -fexceptions won't help it here - libgcc still does not throw.

On Arm as in the AArch32 port,  async unwinding will not work as those
can't be expressed in the EH format tables.

regards
Ramana

>
> Richard.
>
> >
> >   Maciej


Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-27 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Aug 24, 2020 at 4:35 PM Christophe Lyon
 wrote:
>
> On Mon, 24 Aug 2020 at 11:09, Christophe Lyon
>  wrote:
> >
> > On Sat, 22 Aug 2020 at 00:44, Ramana Radhakrishnan
> >  wrote:
> > >
> > > On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
> > >  wrote:
> > > >
> > > > armv8-m.base (cortex-m23) has the movt instruction, so we need to
> > > > disable the define_split to generate a constant in this case,
> > > > otherwise we get incorrect insn constraints as described in PR94538.
> > > >
> > > > We also need to fix the pure-code alternative for thumb1_movsi_insn
> > > > because the assembler complains with instructions like
> > > > movs r0, #:upper8_15:1234
> > > > (Internal error in md_apply_fix)
> > > > We now generate movs r0, 4 instead.
> > > >
> > > > 2020-08-19  Christophe Lyon  
> > > > gcc/ChangeLog:
> > > >
> > > > * config/arm/thumb1.md: Disable set-constant splitter when
> > > > TARGET_HAVE_MOVT.
> > > > (thumb1_movsi_insn): Fix -mpure-code
> > > > alternative.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/arm/pure-code/pr94538-1.c: New test.
> > > > * gcc.target/arm/pure-code/pr94538-2.c: New test.
> > >
> >
> > Hi Ramana,
> >
> > >  I take it that this fixes the ICE rather than addressing the code
> > > generation / performance bits that Wilco was referring to ? Really the
> > > other code quality / performance issues listed in the PR should really
> > > be broken down into separate PRs while we take this as a fix for
> > > fixing the ICE,
> > >
> > > Under that assumption OK.
> > >
> >
> > Yes, that's correct, this patch just fixes the ICE, as it is cleaner
> > to handle perf issues in a different patch.
> >
>
> The patch applies cleanly to gcc-9 and gcc-10: OK to backport there?

Yes, all good.

Thanks,
Ramana

>
> (I tested that the offending testcase passes on the branches with the
> backport, and fails without).
>
> Thanks,
>
> Christophe
>
> > I'll open a new PR to track the perf issue.
> >
> > Thanks
> >
> >
> > > Ramana
> > >
> > > > ---
> > > >  gcc/config/arm/thumb1.md   | 66 
> > > > ++
> > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
> > > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
> > > >  3 files changed, 79 insertions(+), 12 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
> > > >
> > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > > index 0ff8190..f0129db 100644
> > > > --- a/gcc/config/arm/thumb1.md
> > > > +++ b/gcc/config/arm/thumb1.md
> > > > @@ -70,6 +70,7 @@ (define_split
> > > >"TARGET_THUMB1
> > > > && arm_disable_literal_pool
> > > > && GET_CODE (operands[1]) == CONST_INT
> > > > +   && !TARGET_HAVE_MOVT
> > > > && !satisfies_constraint_I (operands[1])"
> > > >[(clobber (const_int 0))]
> > > >"
> > > > @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
> > > >"TARGET_THUMB1
> > > > && (   register_operand (operands[0], SImode)
> > > > || register_operand (operands[1], SImode))"
> > > > -  "@
> > > > -   movs%0, %1
> > > > -   movs%0, %1
> > > > -   movw%0, %1
> > > > -   #
> > > > -   #
> > > > -   ldmia\\t%1, {%0}
> > > > -   stmia\\t%0, {%1}
> > > > -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> > > > lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> > > > #:lower0_7:%1
> > > > -   ldr\\t%0, %1
> > > > -   str\\t%1, %0
> > > > -   mov\\t%0, %1"
> > > > +{
> > > > +  switch (which_alternative)
> > > > +{
> > > > +  default:
> > > > +  case 0: return "movs\t%0, %1";
&

Re: [PATCH][GCC][GCC-10 backport] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Aug 21, 2020 at 2:28 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
> memory
> operand for the destination, which could lead to an unrecognised builtin when
> expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
> memory operands.
>
> Tested on arm-none-eabi, clean w.r.t. gcc and CMSIS-DSP testsuites. Backports
> cleanly onto gcc-10 branch. OK for backport?
>

OK.

Ramana



> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> PR target/96683
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory operand 
> for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/96683
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
>
> (cherry picked from commit 91d206adfe39ce063f6a5731b92a03c05e82e94a)
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  void
>  foo1 (int8_t * addr, int8x16_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
> +
> +void
> +foo2 (int8_t a, int8x16_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index a4c8c1a..248e7ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
>vst1q_u16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  

Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
 wrote:
>
> armv8-m.base (cortex-m23) has the movt instruction, so we need to
> disable the define_split to generate a constant in this case,
> otherwise we get incorrect insn constraints as described in PR94538.
>
> We also need to fix the pure-code alternative for thumb1_movsi_insn
> because the assembler complains with instructions like
> movs r0, #:upper8_15:1234
> (Internal error in md_apply_fix)
> We now generate movs r0, 4 instead.
>
> 2020-08-19  Christophe Lyon  
> gcc/ChangeLog:
>
> * config/arm/thumb1.md: Disable set-constant splitter when
> TARGET_HAVE_MOVT.
> (thumb1_movsi_insn): Fix -mpure-code
> alternative.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/pure-code/pr94538-1.c: New test.
> * gcc.target/arm/pure-code/pr94538-2.c: New test.

 I take it that this fixes the ICE rather than addressing the code
generation / performance bits that Wilco was referring to ? Really the
other code quality / performance issues listed in the PR should really
be broken down into separate PRs while we take this as a fix for
fixing the ICE,

Under that assumption OK.

Ramana

> ---
>  gcc/config/arm/thumb1.md   | 66 
> ++
>  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
>  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
>  3 files changed, 79 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
>
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 0ff8190..f0129db 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -70,6 +70,7 @@ (define_split
>"TARGET_THUMB1
> && arm_disable_literal_pool
> && GET_CODE (operands[1]) == CONST_INT
> +   && !TARGET_HAVE_MOVT
> && !satisfies_constraint_I (operands[1])"
>[(clobber (const_int 0))]
>"
> @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
>"TARGET_THUMB1
> && (   register_operand (operands[0], SImode)
> || register_operand (operands[1], SImode))"
> -  "@
> -   movs%0, %1
> -   movs%0, %1
> -   movw%0, %1
> -   #
> -   #
> -   ldmia\\t%1, {%0}
> -   stmia\\t%0, {%1}
> -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> #:lower0_7:%1
> -   ldr\\t%0, %1
> -   str\\t%1, %0
> -   mov\\t%0, %1"
> +{
> +  switch (which_alternative)
> +{
> +  default:
> +  case 0: return "movs\t%0, %1";
> +  case 1: return "movs\t%0, %1";
> +  case 2: return "movw\t%0, %1";
> +  case 3: return "#";
> +  case 4: return "#";
> +  case 5: return "ldmia\t%1, {%0}";
> +  case 6: return "stmia\t%0, {%1}";
> +  case 7:
> +  /* pure-code alternative: build the constant byte by byte,
> +instead of loading it from a constant pool.  */
> +   {
> + int i;
> + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> + bool mov_done_p = false;
> + rtx ops[2];
> + ops[0] = operands[0];
> +
> + /* Emit upper 3 bytes if needed.  */
> + for (i = 0; i < 3; i++)
> +   {
> +  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> +
> + if (byte)
> +   {
> + ops[1] = GEN_INT (byte);
> + if (mov_done_p)
> +   output_asm_insn ("adds\t%0, %1", ops);
> + else
> +   output_asm_insn ("movs\t%0, %1", ops);
> + mov_done_p = true;
> +   }
> +
> + if (mov_done_p)
> +   output_asm_insn ("lsls\t%0, #8", ops);
> +   }
> +
> + /* Emit lower byte if needed.  */
> + ops[1] = GEN_INT (op1 & 0xff);
> + if (!mov_done_p)
> +   output_asm_insn ("movs\t%0, %1", ops);
> + else if (op1 & 0xff)
> +   output_asm_insn ("adds\t%0, %1", ops);
> + return "";
> +   }
> +  case 8: return "ldr\t%0, %1";
> +  case 9: return "str\t%1, %0";
> +  case 10: return "mov\t%0, %1";
> +}
> +}
>[(set_attr "length" "2,2,4,4,4,2,2,14,2,2,2")
> (set_attr "type" 
> "mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,alu_sreg,load_4,store_4,mov_reg")
> (set_attr "pool_range" "*,*,*,*,*,*,*, *,1018,*,*")
> diff --git a/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c 
> b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> new file mode 100644
> index 000..31061d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=hard" } { "" } } */
> +/* { dg-options "-mpure-code -mcpu=cortex-m23 -march=armv8-m.base -mthumb 
> -mfloat-abi=soft" } */
> +
> +typedef int 

Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

2020-08-21 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang  wrote:
>
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub3 optab. The sub3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>

Hi Dennis,

Thanks for this patch . However a quick read suggests  at first glance
that it could do with some refactoring or indeed further breaking
down.

1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
the motivation for obviously on a quick read. I'll try and read that
again. Please document why these complex TARGET_ macros exist and how
they are expected to be used in the machine description and what they
are indicated to do.
2. It seems odd that we would have
 "&& ((mode != V2SFmode && mode != V4SFmode)
+|| flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
apply this to TARGET_MVE_FLOAT in the sub3 expander. The point
is that if it isn't safe to vectorize a subtract for Neon, why is it
safe to do the same for MVE ? This was done in 2010 by Julian to fix
PR target/43703 - isn't this applicable on MVE as well ?
3. I'm also going to quibble a bit about the use of VSEL as the name
of an iterator as that conflates it with the instruction vsel and it's
not obvious what's going on here.


> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>

I'm a bit confused as to why this got exposed because of the new MVE
vector modes exposed by this patch.

> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
--with-fpu=neon in the configuration ?


> Is it OK for trunk please?



Ramana

>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
> * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
> * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
> (TARGET_NEON_MVE_HFP): Likewise.
> * config/arm/iterators.md (VSEL): New mode iterator to select modes
> for corresponding targets.
> * config/arm/mve.md (mve_vsubq): New entry for vsub instruction
> using expression 'minus'.
> (mve_vsubq_f): Use minus instead of VSUBQ_F unspec.
> * config/arm/neon.md (sub3): Removed here. Integrated in the
> sub3 in vec-common.md
> * config/arm/vec-common.md (sub3): Enable MVE target. Use VSEL
> to select available modes. Exclude TARGET_NEON_FP16INST from
> TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  
>
> * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> option -fno-ipa-icf and change the instruction count from 8 to 16.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> * gcc.target/arm/mve/vect/vect_sub_1.c: New test.


Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-20 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Aug 20, 2020 at 3:31 PM Joe Ramsay  wrote:
>
> Hi Ramana,
>
> Thanks for the review.
>
> On 18/08/2020, 18:37, "Ramana Radhakrishnan"  
> wrote:
>
> On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
> >
> > From: Joe Ramsay 
> >
> > Hi,
> >
> > Previously, the machine description patterns for vst1q accepted a 
> generic memory
> > operand for the destination, which could lead to an unrecognised 
> builtin when
> > expanding vst1q* intrinsics. This change fixes the patterns to only 
> accept MVE
> > memory operands.
>
> This is OK though I suspect this needs a PR and a backport request for 
> GCC 10.
>
> There's now a PR for this, 96683. I've attached an updated patch file, the 
> only change is
> that I've included the PR number in the changelog. Please let me know if this 
> is OK for
> trunk.

Yep absolutely fine - FTR such fixes to changelogs with PR numbers and
administrativia count as obvious and can just be applied.

Ramana

>
> Thanks,
> Joe
>
> regards
> Ramana
>
> >
> > Thanks,
> > Joe
> >
> > gcc/ChangeLog:
> >
> > 2020-08-13  Joe Ramsay 
> >
> > * config/arm/mve.md (mve_vst1q_f): Require MVE memory 
> operand for
> > destination.
> > (mve_vst1q_): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-08-13  Joe Ramsay 
> >
> > * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only 
> MVE
> > memory operand is accepted.
> > * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> > ---
> >  gcc/config/arm/mve.md   |  4 ++--
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
> >  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
> >  6 files changed, 37 insertions(+), 17 deletions(-)
> >
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index 9758862..465b39a 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -9330,7 +9330,7 @@
> >[(set_attr "length" "4")])
> >
> >  (define_expand "mve_vst1q_f"
> > -  [(match_operand: 0 "memory_operand")
> > +  [(match_operand: 0 "mve_memory_operand")
> > (unspec: [(match_operand:MVE_0 1 "s_register_operand")] 
> VST1Q_F)
> >]
> >"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> > @@ -9340,7 +9340,7 @@
> >  })
> >
> >  (define_expand "mve_vst1q_"
> > -  [(match_operand:MVE_2 0 "memory_operand")
> > +  [(match_operand:MVE_2 0 "mve_memory_operand")
> > (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
> >]
> >"TARGET_HAVE_MVE"
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > index 363b4ca..312b746 100644
> > --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> > @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
> >vst1q_f16 (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > -
> >  void
> >  foo1 (float16_t * addr, float16x8_t value)
> >  {
> >vst1q (addr, value);
> >  }
> >
> > -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> > +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> > +
> > +void
> > +foo2 (float16_t a, float16x8_t x)
> > +{
> > +  vst1q (, x);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gc

Re: [PATCH] arm: Require MVE memory operand for destination of vst1q intrinsic

2020-08-18 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Aug 13, 2020 at 2:18 PM Joe Ramsay  wrote:
>
> From: Joe Ramsay 
>
> Hi,
>
> Previously, the machine description patterns for vst1q accepted a generic 
> memory
> operand for the destination, which could lead to an unrecognised builtin when
> expanding vst1q* intrinsics. This change fixes the patterns to only accept MVE
> memory operands.

This is OK though I suspect this needs a PR and a backport request for GCC 10.


regards
Ramana

>
> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * config/arm/mve.md (mve_vst1q_f): Require MVE memory operand 
> for
> destination.
> (mve_vst1q_): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-13  Joe Ramsay 
>
> * gcc.target/arm/mve/intrinsics/vst1q_f16.c: Add test that only MVE
> memory operand is accepted.
> * gcc.target/arm/mve/intrinsics/vst1q_s16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_s8.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u16.c: Likewise.
> * gcc.target/arm/mve/intrinsics/vst1q_u8.c: Likewise.
> ---
>  gcc/config/arm/mve.md   |  4 ++--
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c  | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c | 10 +++---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u8.c  | 10 +++---
>  6 files changed, 37 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 9758862..465b39a 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9330,7 +9330,7 @@
>[(set_attr "length" "4")])
>
>  (define_expand "mve_vst1q_f"
> -  [(match_operand: 0 "memory_operand")
> +  [(match_operand: 0 "mve_memory_operand")
> (unspec: [(match_operand:MVE_0 1 "s_register_operand")] VST1Q_F)
>]
>"TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
> @@ -9340,7 +9340,7 @@
>  })
>
>  (define_expand "mve_vst1q_"
> -  [(match_operand:MVE_2 0 "memory_operand")
> +  [(match_operand:MVE_2 0 "mve_memory_operand")
> (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand")] VST1Q)
>]
>"TARGET_HAVE_MVE"
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> index 363b4ca..312b746 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_f16.c
> @@ -10,12 +10,16 @@ foo (float16_t * addr, float16x8_t value)
>vst1q_f16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (float16_t * addr, float16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (float16_t a, float16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> index 37c4713..cd14e2c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s16.c
> @@ -10,12 +10,16 @@ foo (int16_t * addr, int16x8_t value)
>vst1q_s16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (int16_t * addr, int16x8_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> +/* { dg-final { scan-assembler-times "vstrh.16" 2 }  } */
> +
> +void
> +foo2 (int16_t a, int16x8_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> index fe5edea..0004c80 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_s8.c
> @@ -10,12 +10,16 @@ foo (int8_t * addr, int8x16_t value)
>vst1q_s8 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> -
>  void
>  foo1 (int8_t * addr, int8x16_t value)
>  {
>vst1q (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrb.8"  }  } */
> +/* { dg-final { scan-assembler-times "vstrb.8" 2 }  } */
> +
> +void
> +foo2 (int8_t a, int8x16_t x)
> +{
> +  vst1q (, x);
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c 
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> index a4c8c1a..248e7ce 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vst1q_u16.c
> @@ -10,12 +10,16 @@ foo (uint16_t * addr, uint16x8_t value)
>vst1q_u16 (addr, value);
>  }
>
> -/* { dg-final { scan-assembler "vstrh.16"  }  } */
> -
>  void
>  foo1 (uint16_t * addr, uint16x8_t value)
>  {
>vst1q 

Re: [PATCH][GCC][Arm]: Fix bootstrap failure with rtl-checking

2020-05-19 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Apr 27, 2020 at 2:22 PM Andre Vieira (lists)
 wrote:
>
> Hi,
>
> The code change that caused this regression was not meant to affect neon
> code-gen, however I missed the REG fall through.  This patch makes sure
> we only get the left-hand of the PLUS if it is indeed a PLUS expr.
>
> I suggest that in gcc-11 this code is cleaned up, as I do not think we
> even need the overlap checks, NEON only loads from or stores to FP
> registers and these can't be used in its addressing modes.
>
> Bootstrapped arm-linux-gnueabihf with '--enable-checking=yes,rtl' for
> armv7-a and amrv8-a.
>
> Is this OK for trunk?
>

Also for GCC-10  ?

Ramana

> gcc/ChangeLog:
> 2020-04-27  Andre Vieira  
>
>  * config/arm/arm.c (output_move_neon): Only get the first operand,
>  if addr is PLUS.
>


Re: [ARM][wwwdocs]: Document Armv8.1-M Mainline Security Extensions changes.

2020-05-15 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, May 15, 2020 at 12:31 PM Srinath Parvathaneni
 wrote:
>
> Armv8.1-M Mainline Security Extensions related changes in GCC-10.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
> index 
> 57ca749da72ed64da37b3eb5404cf5cde8be44dd..10bf3b78c7769b73c808bd2c2fe60ebbfc9c887e
>  100644
> --- a/htdocs/gcc-10/changes.html
> +++ b/htdocs/gcc-10/changes.html
> @@ -747,6 +747,9 @@ typedef svbool_t pred512 
> __attribute__((arm_sve_vector_bits(512)));
>Support for the Custom Datapath Extension beta ACLE
> href="https://developer.arm.com/docs/101028/0010/custom-datapath-extension;>
>intrinsics has been added.
> +  Support for Armv8.1-M Mainline Security Extensions architecture has 
> been added. The -mcmse option,
> +  when used in combination with an Armv8.1-M Mainline architecture (for 
> example, -march=armv8.1.m.main -mcmse),
> +  now leads to the generation of improved code sequences when changing 
> security states.
>  
>

Ok.

Ramana

>
>


Re: [PATCH 2/2] arm: Add support for interrupt routines to reg_needs_saving_p

2020-05-15 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, May 15, 2020 at 7:36 AM Christophe Lyon
 wrote:
>
> On Thu, 14 May 2020 at 17:58, Ramana Radhakrishnan
>  wrote:
> >
> > >  static bool reg_needs_saving_p (unsigned reg)
> > >  {
> > >unsigned long func_type = arm_current_func_type ();
> >
> > Ah ok , you needed it here.
>
> Yes sorry.
> Is this patch (2/2) OK?
>

This looks ok to me as long as there are no regressions and you rejig
the hunks between 1/2 and 2/2

regards
Ramana

> Thanks,
>
> Christophe
>
> >
> > Ramana


Re: [PATCH 2/2] arm: Add support for interrupt routines to reg_needs_saving_p

2020-05-14 Thread Ramana Radhakrishnan via Gcc-patches
>  static bool reg_needs_saving_p (unsigned reg)
>  {
>unsigned long func_type = arm_current_func_type ();

Ah ok , you needed it here.

Ramana


Re: [PATCH 1/2] arm: Factorize several occurrences of the same code into reg_needs_saving_p

2020-05-14 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, May 14, 2020 at 3:58 PM Christophe Lyon via Gcc-patches
 wrote:
>
> The same code pattern occurs in several functions, so it seems cleaner
> to move it into a dedicated function.
>
> 2020-05-14  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (reg_needs_saving_p): New function.
> (use_return_insn): Use reg_needs_saving_p.
> (arm_get_vfp_saved_size): Likewise.
> (arm_compute_frame_layout): Likewise.
> (arm_save_coproc_regs): Likewise.
> (thumb1_expand_epilogue): Likewise.
> (arm_expand_epilogue_apcs_frame): Likewise.
> (arm_expand_epilogue): Likewise.
> ---
>  gcc/config/arm/arm.c | 46 --
>  1 file changed, 24 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c88de3e..694c1bb 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -4188,6 +4188,18 @@ arm_trampoline_adjust_address (rtx addr)
>return addr;
>  }
>
> +/* Return 1 if REG needs to be saved.   */
> +static bool reg_needs_saving_p (unsigned reg)

static inline ?

> +{
> +  unsigned long func_type = arm_current_func_type ();

Why is this needed here when it's not used further ?

> +
> +  if (!df_regs_ever_live_p (reg)
> +  || call_used_or_fixed_reg_p (reg))
> +return false;
> +  else
> +return true;
> +}
> +
>  /* Return 1 if it is possible to return using a single instruction.
> If SIBLING is non-null, this is a test for a return before a sibling
> call.  SIBLING is the call insn, so we can examine its register usage.  */
> @@ -4317,12 +4329,12 @@ use_return_insn (int iscond, rtx sibling)
>   since this also requires an insn.  */
>if (TARGET_VFP_BASE)
>  for (regno = FIRST_VFP_REGNUM; regno <= LAST_VFP_REGNUM; regno++)
> -  if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
> +  if (reg_needs_saving_p (regno))
> return 0;
>
>if (TARGET_REALLY_IWMMXT)
>  for (regno = FIRST_IWMMXT_REGNUM; regno <= LAST_IWMMXT_REGNUM; regno++)
> -  if (df_regs_ever_live_p (regno) && ! call_used_or_fixed_reg_p (regno))
> +  if (reg_needs_saving_p (regno))
> return 0;
>
>return 1;
> @@ -20943,7 +20955,6 @@ thumb1_compute_save_core_reg_mask (void)
>return mask;
>  }
>
> -
>  /* Return the number of bytes required to save VFP registers.  */
>  static int
>  arm_get_vfp_saved_size (void)
> @@ -20961,10 +20972,7 @@ arm_get_vfp_saved_size (void)
>regno < LAST_VFP_REGNUM;
>regno += 2)
> {
> - if ((!df_regs_ever_live_p (regno)
> -  || call_used_or_fixed_reg_p (regno))
> - && (!df_regs_ever_live_p (regno + 1)
> - || call_used_or_fixed_reg_p (regno + 1)))
> + if (!reg_needs_saving_p (regno) && !reg_needs_saving_p (regno + 1))
> {
>   if (count > 0)
> {
> @@ -22489,8 +22497,7 @@ arm_compute_frame_layout (void)
>   for (regno = FIRST_IWMMXT_REGNUM;
>regno <= LAST_IWMMXT_REGNUM;
>regno++)
> -   if (df_regs_ever_live_p (regno)
> -   && !call_used_or_fixed_reg_p (regno))
> +   if (reg_needs_saving_p (regno))
>   saved += 8;
> }
>
> @@ -22711,8 +22718,9 @@ arm_save_coproc_regs(void)
>unsigned start_reg;
>rtx insn;
>
> +  if (TARGET_REALLY_IWMMXT)
>for (reg = LAST_IWMMXT_REGNUM; reg >= FIRST_IWMMXT_REGNUM; reg--)
> -if (df_regs_ever_live_p (reg) && !call_used_or_fixed_reg_p (reg))
> +if (reg_needs_saving_p (reg))
>{
> insn = gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx);
> insn = gen_rtx_MEM (V2SImode, insn);
> @@ -22727,9 +22735,7 @@ arm_save_coproc_regs(void)
>
>for (reg = FIRST_VFP_REGNUM; reg < LAST_VFP_REGNUM; reg += 2)
> {
> - if ((!df_regs_ever_live_p (reg) || call_used_or_fixed_reg_p (reg))
> - && (!df_regs_ever_live_p (reg + 1)
> - || call_used_or_fixed_reg_p (reg + 1)))
> + if (!reg_needs_saving_p (reg) && !reg_needs_saving_p (reg + 1))
> {
>   if (start_reg != reg)
> saved_size += vfp_emit_fstmd (start_reg,
> @@ -27024,7 +27030,7 @@ thumb1_expand_epilogue (void)
>/* Emit a clobber for each insn that will be restored in the epilogue,
>   so that flow2 will get register lifetimes correct.  */
>for (regno = 0; regno < 13; regno++)
> -if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
> +if (reg_needs_saving_p (regno))
>emit_clobber (gen_rtx_REG (SImode, regno));
>
>if (! df_regs_ever_live_p (LR_REGNUM))
> @@ -27090,9 +27096,7 @@ arm_expand_epilogue_apcs_frame (bool really_return)
>
>for (i = FIRST_VFP_REGNUM; i < LAST_VFP_REGNUM; i += 2)
>  /* Look for a case where a reg does not need restoring.  */
> -if ((!df_regs_ever_live_p (i) || call_used_or_fixed_reg_p (i))
> - 

Re: [PATCH] arm.c: Clarify error message in thumb1_expand_prologue

2020-05-14 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, May 14, 2020 at 3:57 PM Christophe Lyon via Gcc-patches
 wrote:
>
> While running the tests with -march=armv5t -mthumb, I came across this
> error message which I think could be clearer.
>
> 2020-05-14  Christophe Lyon  
>
> gcc/
> * config/arm/arm.c (thumb1_expand_prologue): Update error message.
> ---
>  gcc/config/arm/arm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index d507819..dda8771 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -26502,7 +26502,7 @@ thumb1_expand_prologue (void)
>
>if (IS_INTERRUPT (func_type))
>  {
> -  error ("interrupt Service Routines cannot be coded in Thumb mode");
> +  error ("Interrupt Service Routines cannot be coded in Thumb-1 mode");
>return;
>  }
>
> --
> 2.7.4
>

OK.

Ramana


Re: [PATCH] arm: Remove duplicate entries in isr_attribute_args [PR target/57002]

2020-04-29 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 29, 2020 at 4:19 PM Christophe Lyon via Gcc-patches
 wrote:
>
> Remove two duplicate entries in isr_attribute_args ("abort" and
> "ABORT").
>
> 2020-04-29  Christophe Lyon  
>
> PR target/57002
> gcc/
> * config/arm/arm.c (isr_attribute_args): Remove duplicate entries.


OK, this would count as obvious.

Ramana
> ---
>  gcc/config/arm/arm.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 30a2a3a..6a6e804 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3925,8 +3925,6 @@ static const isr_attribute_arg isr_attribute_args [] =
>{ "fiq",   ARM_FT_FIQ },
>{ "ABORT", ARM_FT_ISR },
>{ "abort", ARM_FT_ISR },
> -  { "ABORT", ARM_FT_ISR },
> -  { "abort", ARM_FT_ISR },
>{ "UNDEF", ARM_FT_EXCEPTION },
>{ "undef", ARM_FT_EXCEPTION },
>{ "SWI",   ARM_FT_EXCEPTION },
> --
> 2.7.4
>


Re: [PATCH] arm: Extend the PR94780 fix to arm

2020-04-29 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 29, 2020 at 11:30 AM Richard Sandiford
 wrote:
>
> Essentially the same fix as for x86.
>
> Tested on arm-linux-gnueabihf and armeb-eabi.  Bordering on the obvious
> I guess, but OK to install?
>
> Richard
>

Ok.

Ramana

>
> 2020-04-29  Richard Sandiford  
>
> gcc/
> * config/arm/arm-builtins.c (arm_atomic_assign_expand_fenv): Use
> TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
> fenv_var and new_fenv_var.
> ---
>  gcc/config/arm/arm-builtins.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index aee3fd6e2ff..f64742e6447 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -4167,8 +4167,9 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 
> tree *update)
>mask = build_int_cst (unsigned_type_node,
> ~((ARM_FE_ALL_EXCEPT << ARM_FE_EXCEPT_SHIFT)
>   | ARM_FE_ALL_EXCEPT));
> -  ld_fenv = build2 (MODIFY_EXPR, unsigned_type_node,
> -   fenv_var, build_call_expr (get_fpscr, 0));
> +  ld_fenv = build4 (TARGET_EXPR, unsigned_type_node,
> +   fenv_var, build_call_expr (get_fpscr, 0),
> +   NULL_TREE, NULL_TREE);
>masked_fenv = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_var, mask);
>hold_fnclex = build_call_expr (set_fpscr, 1, masked_fenv);
>*hold = build2 (COMPOUND_EXPR, void_type_node,
> @@ -4189,8 +4190,8 @@ arm_atomic_assign_expand_fenv (tree *hold, tree *clear, 
> tree *update)
> __atomic_feraiseexcept (new_fenv_var);  */
>
>new_fenv_var = create_tmp_var_raw (unsigned_type_node);
> -  reload_fenv = build2 (MODIFY_EXPR, unsigned_type_node, new_fenv_var,
> -   build_call_expr (get_fpscr, 0));
> +  reload_fenv = build4 (TARGET_EXPR, unsigned_type_node, new_fenv_var,
> +   build_call_expr (get_fpscr, 0), NULL_TREE, NULL_TREE);
>restore_fnenv = build_call_expr (set_fpscr, 1, fenv_var);
>atomic_feraiseexcept = builtin_decl_implicit 
> (BUILT_IN_ATOMIC_FERAISEEXCEPT);
>update_call = build_call_expr (atomic_feraiseexcept, 1,


Re: [PATCH v2] aarch64: Add TX3 machine model

2020-04-24 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 22, 2020 at 8:25 PM Joel Jones  wrote:
>
> Yes, Bellsoft's contribution is to be covered under the Marvell copyright
>
> assignment, as this is a work for hire.


Thanks !

Ramana
>
>
>
> Joel
>
>
>
> >Yes, Bellsoft's contribution is to be covered under the Marvell copyright
>
> >assignment, as this is a work for hire.
>
> >
>
> >Joel
>
> >
>
> >>>Hi Anton,
>
> >>>
>
>  -Original Message-
>
>  From: Gcc-patches  On Behalf Of Anton
>
>  Youdkevitch
>
>  Sent: 20 April 2020 19:29
>
>  To: gcc-patches@gcc.gnu.org
>
>  Cc: jo...@marvell.com
>
>  Subject: [PATCH v2] aarch64: Add TX3 machine model
>
> 
>
>  Here is the patch introducing thunderxt311 maching model
>
>  for the scheduler. A name for the new chip was added to the
>
>  list of the names to be recognized as a valid parameter for mcpu
>
>  and mtune flags. The TX2 cost model was reused for TX3.
>
> 
>
>  The previously used "cryptic" name for the command line
>
>  parameter is replaced with the same "thunderxt311" name.
>
> 
>
>  Bootstrapped on AArch64.
>
> >>>
>
> >>>Thanks for the patch. I had meant to ask, do you have a copyright 
> >>>assignment in place?
>
> >>>We'd need one to accept a contribution of this size.
>
> >>>Thanks,
>
> >>>Kyrill
>
> >>>
>
> 
>
>  2020-04-20 Anton Youdkevitch 
>
> 
>
>  * config/aarch64/aarch64-cores.def: Add the chip name.
>
>  * config/aarch64/aarch64-tune.md: Regenerated.
>
>  * gcc/config/aarch64/aarch64.c: Add the cost tables for the chip.
>
>  * gcc/config/aarch64/thunderx3t11.md: New file: add the new
>
>  machine model for the scheduler
>
>  * gcc/config/aarch64/aarch64.md: Include the new model.
>
> 
>
>  ---
>
>   gcc/config/aarch64/aarch64-cores.def |   3 +
>
>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>
>   gcc/config/aarch64/aarch64.c |  27 +
>
>   gcc/config/aarch64/aarch64.md|   1 +
>
>   gcc/config/aarch64/thunderx3t11.md   | 686 +++
>
>   5 files changed, 718 insertions(+), 1 deletion(-)
>
> >>>
>
> >>>
>
> >>>
>
> >>>
>
> >


Re: [PATCH v2] aarch64: Add TX3 machine model

2020-04-22 Thread Ramana Radhakrishnan via Gcc-patches
On Wed, Apr 22, 2020 at 5:38 AM Joel Jones via Gcc-patches
 wrote:
>
> I just joined the gcc-patches list, so I hope the mail software can parse 
> this out with an "In-Reply-To" header.
>
> I work for Marvell, and Anton's work is approved for submittal. I wrote the 
> first version of the .md file. I'm certain we have a copyright assignment.in 
> place, as we've had employees in the past six months submit changes, for 
> example Steve Ellcey.
>
I can certainly see Marvell hold a copyright assignment from 2010 in
the copyright.list file.

For being clear, is that stating that Anton's work is also covered by
that copyright assignment ?

Ramana

> Joel Jones
>
> >Hi Anton,
> >
> >> -Original Message-
> >> From: Gcc-patches  On Behalf Of Anton
> >> Youdkevitch
> >> Sent: 20 April 2020 19:29
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: jo...@marvell.com
> >> Subject: [PATCH v2] aarch64: Add TX3 machine model
> >>
> >> Here is the patch introducing thunderxt311 maching model
> >> for the scheduler. A name for the new chip was added to the
> >> list of the names to be recognized as a valid parameter for mcpu
> >> and mtune flags. The TX2 cost model was reused for TX3.
> >>
> >> The previously used "cryptic" name for the command line
> >> parameter is replaced with the same "thunderxt311" name.
> >>
> >> Bootstrapped on AArch64.
> >
> >Thanks for the patch. I had meant to ask, do you have a copyright assignment 
> >in place?
> >We'd need one to accept a contribution of this size.
> >Thanks,
> >Kyrill
> >
> >>
> >> 2020-04-20 Anton Youdkevitch 
> >>
> >> * config/aarch64/aarch64-cores.def: Add the chip name.
> >> * config/aarch64/aarch64-tune.md: Regenerated.
> >> * gcc/config/aarch64/aarch64.c: Add the cost tables for the chip.
> >> * gcc/config/aarch64/thunderx3t11.md: New file: add the new
> >> machine model for the scheduler
> >> * gcc/config/aarch64/aarch64.md: Include the new model.
> >>
> >> ---
> >>  gcc/config/aarch64/aarch64-cores.def |   3 +
> >>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
> >>  gcc/config/aarch64/aarch64.c |  27 +
> >>  gcc/config/aarch64/aarch64.md|   1 +
> >>  gcc/config/aarch64/thunderx3t11.md   | 686 +++
> >>  5 files changed, 718 insertions(+), 1 deletion(-)
>


Re: [PATCH] arm: Fix up arm installed unwind.h for use in pedantic modes [PR93615]

2020-02-07 Thread Ramana Radhakrishnan
On Fri, Feb 7, 2020 at 8:19 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the following testcase shows, unwind.h on ARM can't be (starting with GCC
> 10) compiled with -std=c* modes, only -std=gnu* modes.
> The problem is it uses asm keyword, which isn't a keyword in those modes
> (system headers vs. non-system ones don't make a difference here).
> glibc and other installed headers use __asm or __asm__ keywords instead that
> work fine in both standard and gnu modes.
>
> While there, as it is an installed header, I think it is also wrong to
> completely ignore any identifier namespace rules.
> The generic unwind.h defines just _Unwind* namespace identifiers plus
> _sleb128_t/_uleb128_t (but e.g. unlike libstdc++/glibc headers doesn't
> uglify operand names), the ARM unwind.h is much worse here.  I've just
> changed the gnu_Unwind_Find_got function at least not be in user identifier
> namespace, but perhaps it would be good to go further and rename e.g.
> #define UNWIND_STACK_REG 13
> #define UNWIND_POINTER_REG 12
> #define FDPIC_REGNUM 9
> #define STR(x) #x
> #define XSTR(x) STR(x)
> or e.g.
>   typedef _Unwind_Reason_Code (*personality_routine) (_Unwind_State,
>   _Unwind_Control_Block *, _Unwind_Context *);
> in unwind-arm-common.h.
>
> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?
>
> 2020-02-07  Jakub Jelinek  
>
> PR target/93615
> * config/arm/unwind-arm.h (gnu_Unwind_Find_got): Rename to ...
> (_Unwind_gnu_Find_got): ... this.  Use __asm instead of asm.  Remove
> trailing :s in asm.  Formatting fixes.
> (_Unwind_decode_typeinfo_ptr): Adjust caller.
>
> * gcc.dg/pr93615.c: New test.
>

Ok, thanks jakub

Ramana

> --- libgcc/config/arm/unwind-arm.h.jj   2020-01-12 11:54:38.616380172 +0100
> +++ libgcc/config/arm/unwind-arm.h  2020-02-06 16:16:54.244624408 +0100
> @@ -43,19 +43,15 @@ extern "C" {
>  #endif
>  _Unwind_Ptr __attribute__((weak)) __gnu_Unwind_Find_got (_Unwind_Ptr);
>
> -static inline _Unwind_Ptr gnu_Unwind_Find_got (_Unwind_Ptr ptr)
> +static inline _Unwind_Ptr _Unwind_gnu_Find_got (_Unwind_Ptr ptr)
>  {
>  _Unwind_Ptr res;
>
>  if (__gnu_Unwind_Find_got)
> -   res =  __gnu_Unwind_Find_got (ptr);
> +   res = __gnu_Unwind_Find_got (ptr);
>  else
> -  {
> -   asm volatile ("mov %[result], r" XSTR(FDPIC_REGNUM)
> - : [result]"=r" (res)
> - :
> - :);
> -  }
> +   __asm volatile ("mov %[result], r" XSTR(FDPIC_REGNUM)
> +   : [result] "=r" (res));
>
>  return res;
>  }
> @@ -75,7 +71,7 @@ static inline _Unwind_Ptr gnu_Unwind_Fin
>  #if __FDPIC__
>/* For FDPIC, we store the offset of the GOT entry.  */
>/* So, first get GOT from dynamic linker and then use indirect access. 
>  */
> -  tmp += gnu_Unwind_Find_got (ptr);
> +  tmp += _Unwind_gnu_Find_got (ptr);
>tmp = *(_Unwind_Word *) tmp;
>  #elif (defined(linux) && !defined(__uClinux__)) || defined(__NetBSD__) \
>  || defined(__FreeBSD__) || defined(__fuchsia__)
> --- gcc/testsuite/gcc.dg/pr93615.c.jj   2020-02-06 22:40:00.921472574 +0100
> +++ gcc/testsuite/gcc.dg/pr93615.c  2020-02-06 22:39:52.937591443 +0100
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c11" } */
> +/* { dg-require-effective-target exceptions } */
> +
> +#include 
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
>
> Jakub
>


Re: [PATCH v2][ARM] Disable code hoisting with -O3 (PR80155)

2020-01-28 Thread Ramana Radhakrishnan
On Tue, Nov 26, 2019 at 3:18 PM Wilco Dijkstra  wrote:
>
> Hi,
>
> While code hoisting generally improves codesize, it can affect performance
> negatively. Benchmarking shows it doesn't help SPEC and negatively affects
> embedded benchmarks. Since the impact is relatively small with -O2 and mainly
> affects -O3, the simplest option is to disable code hoisting for -O3 and 
> higher.
>
> OK for commit?
>
> ChangeLog:
> 2019-11-26  Wilco Dijkstra  
>
> PR tree-optimization/80155
> * common/config/arm/arm-common.c (arm_option_optimization_table):
> Disable -fcode-hoisting with -O3.
> --
>
> diff --git a/gcc/common/config/arm/arm-common.c 
> b/gcc/common/config/arm/arm-common.c
> index 
> b761d3abd670a144a593c4b410b1e7fbdcb52f56..3e11f21b7dd76cc071b645c32a6fdb4a92511279
>  100644
> --- a/gcc/common/config/arm/arm-common.c
> +++ b/gcc/common/config/arm/arm-common.c
> @@ -39,6 +39,8 @@ static const struct default_options 
> arm_option_optimization_table[] =
>  /* Enable section anchors by default at -O1 or higher.  */
>  { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
>  { OPT_LEVELS_FAST, OPT_fsched_pressure, NULL, 1 },
> +/* Disable code hoisting with -O3 or higher.  */
> +{ OPT_LEVELS_3_PLUS, OPT_fcode_hoisting, NULL, 0 },
>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>};
>

What are the cases in O3 where this is slower with embedded benchmarks
and how can we fix it ? Keeping the target "different" in this manner
doesn't augur well for the long term.

Ramana


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-20 Thread Ramana Radhakrishnan
On Fri, Oct 18, 2019 at 8:49 PM Richard Earnshaw
 wrote:
>
>
> This series of patches rewrites all the DImode arithmetic patterns for
> the Arm backend when compiling for Arm or Thumb2 to split the
> operations during expand (the thumb1 code is unchanged and cannot
> benefit from early splitting as we are unable to expose the carry
> flag).
>
> This has a number of benefits:
>  - register allocation has more freedom to use independent
>registers for the upper and lower halves of the register
>  - we can make better use of combine for spotting insn merge
>opportunities without needing many additional patterns that are
>only used for DImode
>  - we eliminate a number of bugs in the machine description where
>the carry calculations were not correctly propagated by the
>split patterns (we mostly got away with this because the
>splitting previously happened only after most of the important
>optimization passes had been run).
>
> The patch series starts by paring back all the DImode arithmetic
> support to a very simple form without any splitting at all and then
> progressively re-implementing the patterns with early split
> operations.  This proved to be the only sane way of untangling the
> existing code due to a number of latent bugs which would have been
> exposed if a different approach had been taken.
>
> Each patch should produce a working compiler (it did when it was
> originally written), though since the patch set has been re-ordered
> slightly there is a possibility that some of the intermediate steps
> may have missing test updates that are only cleaned up later.
> However, only the end of the series should be considered complete.
> I've kept the patch as a series to permit easier regression hunting
> should that prove necessary.

Yay ! it's quite nice to see this go in.

Ramana


>
> R.
>
> Richard Earnshaw (29):
>   [arm] Rip out DImode addition and subtraction splits.
>   [arm] Perform early splitting of adddi3.
>   [arm] Early split zero- and sign-extension
>   [arm] Rewrite addsi3_carryin_shift_ in canonical form
>   [arm] fix constraints on addsi3_carryin_alt2
>   [arm] Early split subdi3
>   [arm] Remove redundant DImode subtract patterns
>   [arm] Introduce arm_carry_operation
>   [arm] Correctly cost addition with a carry-in
>   [arm] Correct cost calculations involving borrow for subtracts.
>   [arm] Reduce cost of insns that are simple reg-reg moves.
>   [arm] Implement negscc using SBC when appropriate.
>   [arm] Add alternative canonicalizations for subtract-with-carry +
> shift
>   [arm] Early split simple DImode equality comparisons
>   [arm] Improve handling of DImode comparisions against constants.
>   [arm] early split most DImode comparison operations.
>   [arm] Handle some constant comparisons using rsbs+rscs
>   [arm] Cleanup dead code - old support for DImode comparisons
>   [arm] Handle immediate values in uaddvsi4
>   [arm] Early expansion of uaddvdi4.
>   [arm] Improve code generation for addvsi4.
>   [arm] Allow the summation result of signed add-with-overflow to be
> discarded.
>   [arm] Early split addvdi4
>   [arm] Improve constant handling for usubvsi4.
>   [arm] Early expansion of usubvdi4.
>   [arm] Improve constant handling for subvsi4.
>   [arm] Early expansion of subvdi4
>   [arm] Improvements to negvsi4 and negvdi4.
>   [arm] Fix testsuite nit when compiling for thumb2
>
>  gcc/config/arm/arm-modes.def  |   19 +-
>  gcc/config/arm/arm-protos.h   |1 +
>  gcc/config/arm/arm.c  |  598 -
>  gcc/config/arm/arm.md | 2020 ++---
>  gcc/config/arm/iterators.md   |   15 +-
>  gcc/config/arm/predicates.md  |   29 +-
>  gcc/config/arm/thumb2.md  |8 +-
>  .../gcc.dg/builtin-arith-overflow-3.c |   41 +
>  gcc/testsuite/gcc.target/arm/negdi-3.c|4 +-
>  9 files changed, 1757 insertions(+), 978 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c
>


Re: [PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-10-13 Thread Ramana Radhakrishnan
> 
> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, 
> however, on my native Aarch32 setup the test times out when run as part 
> of a big "make check-gcc" regression, but not when run individually.
> 
> 2019-10-11  Stamatis Markianos-Wright 
> 
>   * config/arm/arm.md: Update b for Thumb2 range checks.
>   * config/arm/arm.c: New function arm_gen_far_branch.
>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>   prototype.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-11  Stamatis Markianos-Wright 
> 
>   * testsuite/gcc.target/arm/pr91816.c: New test.

> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index f995974f9bb..1dce333d1c3 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
> cpu_arch_option *,
>  
>  void arm_initialize_isa (sbitmap, const enum isa_feature *);
>  
> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *);
> +
> +

Lets get the nits out of the way.

Unnecessary extra new line, need a space between int and const above.


>  #endif /* ! GCC_ARM_PROTOS_H */
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 39e1a1ef9a2..1a693d2ddca 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>  }
>  } /* Namespace selftest.  */
>  
> +
> +/* Generate code to enable conditional branches in functions over 1 MiB.  */
> +const char *
> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
> + const char * branch_format)

Not sure if this is some munging from the attachment but check
vertical alignment of parameters.

> +{
> +  rtx_code_label * tmp_label = gen_label_rtx ();
> +  char label_buf[256];
> +  char buffer[128];
> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
> + CODE_LABEL_NUMBER (tmp_label));
> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
> +  rtx dest_label = operands[pos_label];
> +  operands[pos_label] = tmp_label;
> +
> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
> +  output_asm_insn (buffer, operands);
> +
> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr);
> +  operands[pos_label] = dest_label;
> +  output_asm_insn (buffer, operands);
> +  return "";
> +}
> +
> +

Unnecessary extra newline.

>  #undef TARGET_RUN_TARGET_SELFTESTS
>  #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>  #endif /* CHECKING_P */
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index f861c72ccfc..634fd0a59da 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -6686,9 +6686,16 @@
>  ;; And for backward branches we have 
>  ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
>  ;;
> +;; In 16-bit Thumb these ranges are:
>  ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving (-2040->2048).
>  ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
>  
> +;; In 32-bit Thumb these ranges are:
> +;; For a 'b'   +/- 16MB is not checked for.
> +;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
> +;; (-1048568 -> 1048576).
> +
> +

Unnecessary extra newline.

>  (define_expand "cbranchsi4"
>[(set (pc) (if_then_else
> (match_operator 0 "expandable_comparison_operator"
> @@ -6947,22 +6954,42 @@
> (pc)))]
>"TARGET_32BIT"
>"*
> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
> -{
> -  arm_ccfsm_state += 2;
> -  return \"\";
> -}
> -  return \"b%d1\\t%l0\";
> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
> +  {
> + arm_ccfsm_state += 2;
> + return \"\";
> +  }
> + switch (get_attr_length (insn))
> +  {
> + // Thumb2 16-bit b{cond}
> + case 2:
> +
> + // Thumb2 32-bit b{cond}
> + case 4: return \"b%d1\\t%l0\";break;
> +
> + // Thumb2 b{cond} out of range.  Use unconditional branch.
> + case 8: return arm_gen_far_branch \
> + (operands, 0, \"Lbcond\", \"b%D1\t\");
> + break;
> +
> + // A32 b{cond}
> + default: return \"b%d1\\t%l0\";
> +  }

Please fix indentation here. 

>"
>[(set_attr "conds" "use")
> (set_attr "type" "branch")
> (set (attr "length")
> - (if_then_else
> -(and (match_test "TARGET_THUMB2")
> - (and (ge (minus (match_dup 0) (pc)) (const_int -250))
> -  (le (minus (match_dup 0) (pc)) (const_int 256
> -(const_int 2)
> -(const_int 4)))]
> + (if_then_else (match_test "TARGET_THUMB2")
> + (if_then_else (and (ge (minus (match_dup 0) (pc)) (const_int -250))
> + (le (minus (match_dup 0) (pc)) (const_int 256)))
> + (const_int 2)
> + (if_then_else (and (ge (minus (match_dup 0) (pc))
> + (const_int -1048568))

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Ramana Radhakrishnan
On Fri, Oct 11, 2019 at 10:42 PM Wilco Dijkstra  wrote:
>
> Hi,
>
>  > the defaults for v7-a are still to use the
>  > Cortex-A8 scheduler
>
> I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old 
> now so
> way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate 
> scheduler
> that performs surprisingly well on both in-order and out-of-order 
> implementations.

On armv8-a we do use cortex-a53 as the default scheduler in the AArch32 backend.

regards
Ramana

>
> Cheers,
> Wilco


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Ramana Radhakrishnan
On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra  wrote:
>
> Hi Ramana,
>
> > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> > spread the range across some v7-a CPUs as well ? While they aren't that 
> > popular today I
> > would suggest you look at them because the defaults for v7-a are still to 
> > use the
> > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used in 
> > places given
> > the availability of hardware.
>
> The results are practically identical to Cortex-A53 and A57 - there is a huge 
> codesize win
> across the board on SPEC2006, there isn't a single benchmark that is larger 
> (ie. more
> spilling).
>
> > I'd be happy to move this forward if you could show if there is no 
> > *increase* in spills
> > for the same range of benchmarks that you are doing for the Cortex-A8 and 
> > Cortex-A9
> > schedulers.
>
> There certainly isn't. I don't think results like these could be any more 
> one-sided, it's a
> significant win for every single benchmark, both for codesize and performance!
>

Ok go ahead - please be sensitive to testsuite regressions.

Ramana


> What isn't clear is whether something has gone horribly wrong in the 
> scheduler which
> could be fixed/reverted, but as it is right now I can't see it being useful 
> at all. This means
> we should also reevaluate whether pressure scheduling now hurts AArch64 too.
>
> Cheers,
> Wilco


Re: [PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-10-11 Thread Ramana Radhakrishnan
On Fri, Oct 11, 2019 at 3:52 PM Wilco Dijkstra  wrote:
>
> Hi Ramana,
>
> > My only question would be whether it's more suitable to use
> > optimize_function_for_size_p(cfun) instead as IIRC that gives us a
> > chance with lto rather than the global optimize_size.
>
> Yes that is even better and that defaults to optimize_size if cfun isn't
> set. I've committed this:
>
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 
> 8b67c9c3657b312be223ab60c01969958baa9ed3..5fad1e5bcc2bc448489fdc8239c676246bbc8879
>  100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1068,9 +1068,8 @@ extern int arm_regs_in_sequence[];
>  /* Use different register alloc ordering for Thumb.  */
>  #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
>
> -/* Tell IRA to use the order we define rather than messing it up with its
> -   own cost calculations.  */
> -#define HONOR_REG_ALLOC_ORDER 1
> +/* Tell IRA to use the order we define when optimizing for size.  */
> +#define HONOR_REG_ALLOC_ORDER optimize_function_for_size_p (cfun)

I'd be happy with a patch that goes and looks at more such uses in the
backend in your copious free time. hint hint.

R

>
>  /* Interrupt functions can only use registers that have already been
> saved by the prologue, even if they would normally be


Re: [PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-10-11 Thread Ramana Radhakrishnan
On Fri, Oct 11, 2019 at 5:17 PM Wilco Dijkstra  wrote:
>
> Hi Ramana,
>
> >On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra  wrote:
> >>
> >> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting 
> >> in
> >> inefficient code.  Since Thumb-2 supports similar address offsets use the 
> >> Arm
> >> legitimization code for Thumb-2 to get significant codesize and performance
> >> gains.  SPECINT2006 shows 0.4% gain on Cortex-A57, while SPECFP improves 
> >> 0.2%.
> >
> > What were the sort of code size gains  ? It did end up piquing my
> > curiosity as to how we missed something so basic.  For instance ldr
>
> h264ref is 1% smaller, various other benchmarks a few hundred bytes to 1KB
> smaller (~0.1%).
>
> > r0, [r0, #-4080] is valid in Arm state but not in Thumb2. Thus if
> >  there was an illegitimate address given here, would we end up
> > producing plus (r0, -4080) ? Yeah a simple testcase doesn't work out.
> > Scratching my head a bit , it's late at night.
>
> If the proposed offsets are not legal, GCC tries something different that is
> legal, hence there is no requirement to only propose legal splits. In any
> case we don't optimize the negative range even on Arm - the offsets are
> always split into 4KB ranges (not 8KB as would be possible).
>
> The negative offset splitting doesn't appear to have changed on Thumb-2
> so there is apparently a different mechanism that deals with negative offsets.
> So only positive offsets are improved with my patch:
>
> int f(int *p) { return p[1025] + p[1026]; }
>
> before:
> movwr3, #4104
> movwr2, #4100
> ldr r2, [r0, r2]
> ldr r0, [r0, r3]
> add r0, r0, r2
> bx  lr
>
> after:
> add r3, r0, #4096
> ldrdr0, r3, [r3, #4]
> add r0, r0, r3
> bx  lr
>
> > Orthogonally it looks like you can clean up the MINUS handling here
> > and in legitimate_address_p , I'm not sure what the status of LRA with
> > MINUS is either and thus we should now look to clean this up or look
> > to turn this on and see what happens. However that's a subject of a
> > future patch.
>
> Well there are lots of cases that aren't handled correctly or optimally
> yet - I'd copy the way I wrote aarch64_legitimize_address_displacement,
> but that's for a future patch indeed.
>
> > For the record, bootstrap with Thumb2  presumably and the testruns were 
> > clean ?
>
> Yes, at the time I ran them.

Yeah it dropped about 2660 bytes out of a Thumb2 bootstrap - so it
seems worth while.

Ok please apply but as usual watch out for any fallout from this.

regards
Ramana
>
> Cheers,
> Wilco


Re: [Patch 2/2][Arm] Implement TARGET_HOOK_SANITIZE_CLONE_ATTRIBUTES to remove cmse_nonsecure_entry

2019-10-10 Thread Ramana Radhakrishnan
On Tue, Oct 8, 2019 at 4:21 PM Andre Vieira (lists)
 wrote:
>
> Hi,
>
> This patch implements the TARGET_HOOK_SANITIZE_CLONE_ATTRIBUTES for the
> arm backend to remove the 'cmse_nonsecure_entry' attribute from cmse.
>
> Bootstrapped the series on x86_64 and built arm-none-eabi, running the
> cmse testsuite for armv8-m.main and armv8-m.base.
>
> Is this OK for trunk?

Ok if the common bit is approved.  And do watch out for any testsuite
multilib fallout.

Ramana
>
> Cheers,
> Andre
>
> gcc/ChangeLog:
>
> 2019-10-08  Andre Vieira  
>
>  * config/arm/arm.c (TARGET_SANITIZE_CLONE_ATTRIBUTES): Define.
>  (arm_sanitize_clone_attributes): New.
>
> gcc/testsuite/ChangeLog:
> 2019-10-08  Andre Vieira  
>
>  * gcc.target/arm/cmse/ipa-clone.c: New test.


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2019-10-10 Thread Ramana Radhakrishnan
On Thu, Oct 10, 2019 at 7:06 PM Richard Sandiford
 wrote:
>
> Wilco Dijkstra  writes:
> > ping
> >
> > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
> > bitfields by their declared type, which results in better codegeneration on 
> > practically
> > any target.
>
> The name is confusing, but the documentation looks accurate to me:
>
> Define this macro as a C expression which is nonzero if accessing less
> than a word of memory (i.e.@: a @code{char} or a @code{short}) is no
> faster than accessing a word of memory, i.e., if such access
> require more than one instruction or if there is no difference in cost
> between byte and (aligned) word loads.
>
> When this macro is not defined, the compiler will access a field by
> finding the smallest containing object; when it is defined, a fullword
> load will be used if alignment permits.  Unless bytes accesses are
> faster than word accesses, using word accesses is preferable since it
> may eliminate subsequent memory access if subsequent accesses occur to
> other fields in the same word of the structure, but to different bytes.
>
> > I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
> > from GCC as it's confusing and useless.
>
> I disagree.  Some targets can optimise single-bit operations when the
> container is a byte, for example.
>
> > OK for commit until we get rid of it?
> >
> > ChangeLog:
> > 2017-11-17  Wilco Dijkstra  
> >
> > gcc/
> > * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
> > --
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index 
> > 056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
> >  100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -769,14 +769,9 @@ typedef struct
> > if given data not on the nominal alignment.  */
> >  #define STRICT_ALIGNMENTTARGET_STRICT_ALIGN
> >
> > -/* Define this macro to be non-zero if accessing less than a word of
> > -   memory is no faster than accessing a word of memory, i.e., if such
> > -   accesses require more than one instruction or if there is no
> > -   difference in cost.
> > -   Although there's no difference in instruction count or cycles,
> > -   in AArch64 we don't want to expand to a sub-word to a 64-bit access
> > -   if we don't have to, for power-saving reasons.  */
> > -#define SLOW_BYTE_ACCESS   0
> > +/* Contrary to all documentation, this enables wide bitfield accesses,
> > +   which results in better code when accessing multiple bitfields.  */
> > +#define SLOW_BYTE_ACCESS   1
> >
> >  #define NO_FUNCTION_CSE 1
>
> I agree this makes sense from a performance point of view, and I think
> the existing comment is admitting that AArch64 has the properties that
> would normally cause us to set SLOW_BYTE_ACCESS to 1.  But the comment
> is claiming that there's a power-saving benefit to leaving it off.
>
> It seems like a weak argument though.  Bitfields are used when several
> values are packed into the same integer, so there's a high likelihood
> we'll need the whole integer anyway.  Avoiding the redundancies described
> in the documention should if anything help with power usage.
>
> Maybe the main concern was using a 64-bit access when a 32-bit one
> would do, since 32-bit bitfield containers are the most common.  But the:
>
>  && GET_MODE_ALIGNMENT (mode) <= align
>
> condition in get_best_mode should avoid that unless the 64-bit
> access is naturally aligned.  (See the big comment above for the
> pros and cons of this.)
>
> So I think we should change the macro value unless anyone can back up the
> power-saving claim.  Let's wait a week (more) to see if anyone objects.

IIRC, that power saving comment comes from the original port and
probably from when
the port was first written which is probably more than 10 years now.

regards
Ramana

Ramana

>
> The comment change isn't OK though.  Please keep the first paragraph
> and just reword the second to say that's why we set the value to 1.
>
> Thanks,
> Richard


Re: [PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-10-10 Thread Ramana Radhakrishnan
On Mon, Sep 9, 2019 at 6:05 PM Wilco Dijkstra  wrote:
>
> Setting HONOR_REG_ALLOC_ORDER improves codesize with -Os, however it generates
> slower and larger code with -O2 and higher.  So only set it when optimizing 
> for
> size.  On Cortex-A57 this improves SPECINT2006 by 0.15% and SPECFP2006 by 
> 0.25%
> while reducing codesize.
>
> Bootstrap OK, OK for commit?
>
> ChangeLog:
> 2019-09-09  Wilco Dijkstra  
>
> * config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for 
> size.
>
> --
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 
> 8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39
>  100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -1065,9 +1065,8 @@ extern int arm_regs_in_sequence[];
>  /* Use different register alloc ordering for Thumb.  */
>  #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
>
> -/* Tell IRA to use the order we define rather than messing it up with its
> -   own cost calculations.  */
> -#define HONOR_REG_ALLOC_ORDER 1
> +/* Tell IRA to use the order we define when optimizing for size.  */
> +#define HONOR_REG_ALLOC_ORDER optimize_size

My only question would be whether it's more suitable to use
optimize_function_for_size_p(cfun) instead as IIRC that gives us a
chance with lto rather than the global optimize_size.

Otherwise ok .

regards
Ramana


>
>  /* Interrupt functions can only use registers that have already been
> saved by the prologue, even if they would normally be


Re: [PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-10-10 Thread Ramana Radhakrishnan
On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra  wrote:
>
> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting in
> inefficient code.  Since Thumb-2 supports similar address offsets use the Arm
> legitimization code for Thumb-2 to get significant codesize and performance
> gains.  SPECINT2006 shows 0.4% gain on Cortex-A57, while SPECFP improves 0.2%.
>
What were the sort of code size gains  ? It did end up piquing my
curiosity as to how we missed something so basic.  For instance ldr
r0, [r0, #-4080] is valid in Arm state but not in Thumb2. Thus if
there was an illegitimate address given here, would we end up
producing plus (r0, -4080) ? Yeah a simple testcase doesn't work out.
Scratching my head a bit , it's late at night.

Orthogonally it looks like you can clean up the MINUS handling here
and in legitimate_address_p , I'm not sure what the status of LRA with
MINUS is either and thus we should now look to clean this up or look
to turn this on and see what happens. However that's a subject of a
future patch.

> Bootstrap OK, OK for commit?
>

For the record, bootstrap with Thumb2  presumably and the testruns were clean ?

regards
Ramana



Ramana


> ChangeLog:
> 2019-09-09  Wilco Dijkstra  
>
> * config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout.
>
> --
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> a5a6a0fab1b4b7ef07931522e7d47e59842d7f27..2601708e7e0716e4668b79e015e366d2164562fd
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -8652,13 +8652,8 @@ arm_legitimize_address (rtx x, rtx orig_x, 
> machine_mode mode)
> return x;
>  }
>
> -  if (!TARGET_ARM)
> -{
> -  /* TODO: legitimize_address for Thumb2.  */
> -  if (TARGET_THUMB2)
> -return x;
> -  return thumb_legitimize_address (x, orig_x, mode);
> -}
> +  if (TARGET_THUMB1)
> +return thumb_legitimize_address (x, orig_x, mode);
>
>if (GET_CODE (x) == PLUS)
>  {


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-10 Thread Ramana Radhakrishnan
On Tue, Jul 30, 2019 at 4:16 PM Wilco Dijkstra  wrote:
>
> Hi all,
>
>  >On 30/07/2019 10:31, Ramana Radhakrishnan wrote:
>  >> On 30/07/2019 10:08, Christophe Lyon wrote:
>
>  >>> Hi Wilco,
>  >>>
>  >>> Do you know which benchmarks were used when this was checked-in?
>  >>> It isn't clear from
>  >>> https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html
>  >>
>  >> It was from my time in Linaro and thus would have been a famous embedded
>  >> benchmark, coremark , spec2000 - all tested probably on cortex-a9 and
>  >> Cortex-A15. In addition to this I would like to see what the impact of
>  >> this is on something like Cortex-A53 as the issue rates are likely to be
>  >> different on the schedulers causing different behaviour.
>
> Obviously there are differences between various schedulers, but the general
> issue is that register pressure is increased many times beyond the spilling 
> limit
> (a few cases I looked at had a pressure well over 120 when there are only 14
> integer registers - this causes panic spilling in the register allocator).
>
> In fact the spilling overhead between the 2 algorithms is almost identical on
> Cortex-A53 and Cortex-A57, so the issue isn't directly related to the pipeline
> model used. It seems more related to the scheduler being too aggressive
> and not caring about register pressure at all (for example lifting a load 100
> instructions before its use so it must be spilled).

In those days it would have been the Cortex-A8, Cortex-A9 schedulers
and the Cortex-A15
schedulers and IIRC the benchmarking would have been mostly on a
Cortex-A9 board or on some
Cortex-A15 boards we had (long gone now) inside Arm.

Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
spread the range
across some v7-a CPUs as well ? While they aren't that popular today I
would suggest
you look at them because the defaults for v7-a are still to use the
Cortex-A8 scheduler and
the Cortex-A9 scheduler might well also get used in places given the
availability of hardware.


>
>  >> I don't have all the notes today for that - maybe you can look into the
>  >> linaro wiki.
>  >>
>  >> I am concerned about taking this patch in without some more data across
>  >> a variety of cores.
>  >>
>  >
>  > My concern is the original patch
>  > (https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html) is lacking in
>  > any real detail as to the reasons for the choice of the second algorithm
>  > over the first.
>  >
>  > - It's not clear what the win was
>  > - It's not clear what outliers there were and whether they were 
> significant.
>  >
>  > And finally, it's not clear if, 7 years later, this is still the best
>  > choice.
>  >
>  > If the second algorithm really is better, why is no other target using
>  > it by default?
>  >
>  > I think we need a bit more information (both ways).  In particular I'm
>  > concerned not just by the overall benchmark average, but also the amount
>  > of variance across the benchmarks.  I think the default needs to avoid
>  > significant outliers if at all possible, even if it is marginally less
>  > good on the average.
>
> The results clearly show that algorithm 1 works best on Arm today - I haven't
> seen a single benchmark where algorithm 2 results in less spilling. We could
> tune algorithm 2 so it switches back to algorithm 1 when register pressure is
> high or a basic block is large. However until it is fixed, the evidence is 
> that
> algorithm 1 is the best choice for current cores.

I'd be happy to move this forward if you could show if there is no
*increase* in spills
for the same range of benchmarks that you are doing for the Cortex-A8
and Cortex-A9
schedulers.

Sorry about the time it has taken. I've been a bit otherwise occupied recently.

regards
Ramana

>
> Wilco


Re: [PATCH][PR91749][arm] FDPIC: Handle -mflip-thumb

2019-09-16 Thread Ramana Radhakrishnan
On Mon, Sep 16, 2019 at 2:40 PM Christophe Lyon
 wrote:
>
> [Re-sending in plain text-mode, sorry for the duplicates]
>
> Hi,
>
> In PR91749, we have ICEs because -mflip-thumb switches to Thumb-1 (the
> default target cpu does not support Thumb-2).
>
> Although we already filter this in arm_configure_build_target, we
> forgot to handle cases when the mode is changed via attributes (either
> in the source code, or via -mflip-thumb).
>
> This patch adds the same error message when trying to apply the
> "thumb" attribute and the target does not support Thumb-2 (only if we
> are in FDPIC mode, of course).
>
> OK?

OK.

Ramana
>
> Thanks,
>
> Christophe


Re: [PATCH testsuite, arm] cache fp16 hw effective-target tests

2019-09-15 Thread Ramana Radhakrishnan
On Fri, Sep 13, 2019 at 5:31 PM Sandra Loosemore
 wrote:
>
> In some bare-metal environments, the tests for fp16 runtime support fail
> in a way that causes a timeout rather than immediate failure.  (E.g.,
> the runtime might provide a do-nothing exception handler that just sits
> in a tight loop and never returns.)  This patch changes the
> effective-target tests for fp16 hardware support to cache the result of
> the test so that we don't have to do this more than once.  I think it
> was probably just an oversight that it wasn't done this way originally,
> since the target is hardly likely to sprout fp16 instruction support
> midway through the test run anyway.  ;-)  Anyway, test results are the
> same with this patch, they just run faster.  OK to commit?

Ok, thanks.

Ramana

>
> -Sandra


Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-07-30 Thread Ramana Radhakrishnan

On 30/07/2019 10:08, Christophe Lyon wrote:

On Mon, 29 Jul 2019 at 18:49, Wilco Dijkstra  wrote:


Currently the Arm backend selects the alternative sched pressure algorithm.
The issue is that this doesn't take register pressure into account, and so
it causes significant additional spilling on Arm where there are only 14
allocatable registers.  SPEC2006 shows significant codesize reduction
with the default pressure algorithm, so switch back to that.  PR77308 shows
~800 fewer instructions.

SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
patches. Overall SPEC codesize is 1.1% smaller.



Hi Wilco,

Do you know which benchmarks were used when this was checked-in?
It isn't clear from https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00706.html


It was from my time in Linaro and thus would have been a famous embedded 
benchmark, coremark , spec2000 - all tested probably on cortex-a9 and 
Cortex-A15. In addition to this I would like to see what the impact of 
this is on something like Cortex-A53 as the issue rates are likely to be 
different on the schedulers causing different behaviour.



I don't have all the notes today for that - maybe you can look into the 
linaro wiki.


I am concerned about taking this patch in without some more data across 
a variety of cores.


Thanks,
Ramana




Thanks,

Christophe


Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

ChangeLog:
2019-07-29  Wilco Dijkstra  

 * config/arm/arm.c (arm_option_override): Don't override sched
 pressure algorithm.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3575,11 +3575,6 @@ arm_option_override (void)
if (use_neon_for_64bits == 1)
   prefer_neon_for_64bits = true;

-  /* Use the alternative scheduling-pressure algorithm by default.  */
-  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
-global_options.x_param_values,
-global_options_set.x_param_values);
-
/* Look through ready list and all of queue for instructions
   relevant for L2 auto-prefetcher.  */
int param_sched_autopref_queue_depth;





Re: [PATCH][ARM] Cleanup DImode shifts

2019-07-22 Thread Ramana Radhakrishnan

On 22/07/2019 17:16, Wilco Dijkstra wrote:

Like the logical operations, expand all shifts early rather than only
sometimes.  The Neon shift expansions are never emitted (not even with
-fneon-for-64bits), so they are not useful.  So all the late expansions
and Neon shift patterns can be removed, and shifts are more optimized
as a result.  Since some extend patterns use Neon DImode shifts, remove
the Neon extend variants and related splits.

A simple example (relying on [1]) generates the same efficient code after
this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
having Neon enabled resulted inefficient code for no reason).

unsigned long long f(unsigned long long x, unsigned long long y)
{ return x & (y >> 33); }

Before:
 strdr4, r5, [sp, #-8]!
 lsr r4, r3, #1
 mov r5, #0
 and r1, r1, r5
 and r0, r0, r4
 ldrdr4, r5, [sp]
 add sp, sp, #8
 bx  lr

After:
 and r0, r0, r3, lsr #1
 mov r1, #0
 bx  lr

Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

[1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html


Thanks for this patch set - What I'm missing in this is any analysis as 
to what's the impact on code generation for neon intrinsics that use 
uint64_t ? Especially things like v_u64 ?



Ramana




ChangeLog:
2019-07-19  Wilco Dijkstra  

* config/arm/iterators.md (qhs_extenddi_cstr): Update.
(qhs_extenddi_cstr): Likewise.
* config/arm/arm.md (ashldi3): Always expand early.
(ashlsi3): Likewise.
(ashrsi3): Likewise.
(zero_extenddi2): Remove Neon variants.
(extenddi2): Likewise.
* config/arm/neon.md (ashldi3_neon_noclobber): Remove.
(signed_shift_di3_neon): Likewise.
(unsigned_shift_di3_neon): Likewise.
(ashrdi3_neon_imm_noclobber): Likewise.
(lshrdi3_neon_imm_noclobber): Likewise.
(di3_neon): Likewise.
(split extend): Remove DI extend split patterns.

 testsuite/
* gcc.target/arm/neon-extend-1.c: Remove test.
* gcc.target/arm/neon-extend-2.c: Remove test.
---

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
  (define_expand "ashldi3"
[(set (match_operand:DI0 "s_register_operand")
  (ashift:DI (match_operand:DI 1 "s_register_operand")
-   (match_operand:SI 2 "general_operand")))]
+   (match_operand:SI 2 "reg_or_int_operand")))]
"TARGET_32BIT"
"
-  if (TARGET_NEON)
-{
-  /* Delay the decision whether to use NEON or core-regs until
-register allocation.  */
-  emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  else
-{
-  /* Only the NEON case can handle in-memory shift counts.  */
-  if (!reg_or_int_operand (operands[2], SImode))
-operands[2] = force_reg (SImode, operands[2]);
-}
-
-  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
-; /* No special preparation statements; expand pattern as above.  */
-  else
-{
-  rtx scratch1, scratch2;
-
-  /* Ideally we should use iwmmxt here if we could know that operands[1]
- ends up already living in an iwmmxt register. Otherwise it's
- cheaper to have the alternate code being generated than moving
- values to iwmmxt regs and back.  */
-
-  /* Expand operation using core-registers.
-'FAIL' would achieve the same thing, but this is a bit smarter.  */
-  scratch1 = gen_reg_rtx (SImode);
-  scratch2 = gen_reg_rtx (SImode);
-  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-operands[2], scratch1, scratch2);
-  DONE;
-}
-  "
-)
+  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+operands[2], gen_reg_rtx (SImode),
+gen_reg_rtx (SImode));
+  DONE;
+")
  
  (define_expand "ashlsi3"

[(set (match_operand:SI0 "s_register_operand")
@@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
   (match_operand:SI 2 "reg_or_int_operand")))]
"TARGET_32BIT"
"
-  if (TARGET_NEON)
-{
-  /* Delay the decision whether to use NEON or core-regs until
-register allocation.  */
-  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
-  DONE;
-}
-
-  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
-; /* No special preparation statements; expand pattern as above.  */
-  else
-{
-  rtx scratch1, scratch2;
-
-  /* Ideally we should use iwmmxt here if we could know that operands[1]
- ends up already living in 

Re: [PATCH][armeb] PR 91060 gcc.c-torture/execute/scal-to-vec1.c fails since r272843

2019-07-10 Thread Ramana Radhakrishnan
On Wed, Jul 10, 2019 at 11:07 AM Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Mon, 8 Jul 2019 at 11:04, Richard Sandiford
> >  wrote:
> >>
> >> Christophe Lyon  writes:
> >> > Hi,
> >> >
> >> > This patch fixes PR 91060 where the lane ordering was no longer the
> >> > right one (GCC's vs architecture's).
> >>
> >> Sorry, we clashed :-)
> >>
> >> I'd prefer to go with the version I attached to bugzilla just now.
> >
> > Yes just saw that, thanks!
>
> The bugzilla version didn't properly adjust vec_setv2di_internal.
> Fixed with the version below, tested on armeb-eabi.
>
> Besides gcc.c-torture/execute/scal-to-vec*.c, the patch also fixes:
>
>   c-c++-common/torture/vector-compare-1.c
>   gcc.target/arm/pr69614.c
>   g++.dg/ext/vector37.C
>
> OK for trunk?

OK.

Ramana
>
> Richard
>
>
> 2019-07-10  Richard Sandiford  
>
> gcc/
> PR target/91060
> * config/arm/iterators.md (V2DI_ONLY): New mode iterator.
> * config/arm/neon.md (vec_set_internal): Add a '@' prefix.
> (vec_setv2di_internal): Reexpress as...
> (@vec_set_internal): ...this.
> * config/arm/arm.c (neon_expand_vector_init): Use gen_vec_set_internal
> rather than gen_neon_vset_lane.
>
> Index: gcc/config/arm/iterators.md
> ===
> --- gcc/config/arm/iterators.md 2019-06-18 09:35:55.377865698 +0100
> +++ gcc/config/arm/iterators.md 2019-07-10 11:01:57.990749932 +0100
> @@ -186,6 +186,9 @@ (define_mode_iterator VX [V8QI V4HI V16Q
>  ;; Modes with 8-bit elements.
>  (define_mode_iterator VE [V8QI V16QI])
>
> +;; V2DI only (for use with @ patterns).
> +(define_mode_iterator V2DI_ONLY [V2DI])
> +
>  ;; Modes with 64-bit elements only.
>  (define_mode_iterator V64 [DI V2DI])
>
> Index: gcc/config/arm/neon.md
> ===
> --- gcc/config/arm/neon.md  2019-07-01 09:37:07.220524486 +0100
> +++ gcc/config/arm/neon.md  2019-07-10 11:01:57.990749932 +0100
> @@ -319,7 +319,7 @@ (define_insn "*movmisalign_neon_lo
>"vld1.\t{%q0}, %A1"
>[(set_attr "type" "neon_load1_1reg")])
>
> -(define_insn "vec_set_internal"
> +(define_insn "@vec_set_internal"
>[(set (match_operand:VD_LANE 0 "s_register_operand" "=w,w")
>  (vec_merge:VD_LANE
>(vec_duplicate:VD_LANE
> @@ -340,7 +340,7 @@ (define_insn "vec_set_internal"
>  }
>[(set_attr "type" "neon_load1_all_lanes,neon_from_gp")])
>
> -(define_insn "vec_set_internal"
> +(define_insn "@vec_set_internal"
>[(set (match_operand:VQ2 0 "s_register_operand" "=w,w")
>  (vec_merge:VQ2
>(vec_duplicate:VQ2
> @@ -369,12 +369,12 @@ (define_insn "vec_set_internal"
>[(set_attr "type" "neon_load1_all_lanes,neon_from_gp")]
>  )
>
> -(define_insn "vec_setv2di_internal"
> -  [(set (match_operand:V2DI 0 "s_register_operand" "=w,w")
> -(vec_merge:V2DI
> -  (vec_duplicate:V2DI
> +(define_insn "@vec_set_internal"
> +  [(set (match_operand:V2DI_ONLY 0 "s_register_operand" "=w,w")
> +(vec_merge:V2DI_ONLY
> +  (vec_duplicate:V2DI_ONLY
>  (match_operand:DI 1 "nonimmediate_operand" "Um,r"))
> -  (match_operand:V2DI 3 "s_register_operand" "0,0")
> +  (match_operand:V2DI_ONLY 3 "s_register_operand" "0,0")
>(match_operand:SI 2 "immediate_operand" "i,i")))]
>"TARGET_NEON"
>  {
> Index: gcc/config/arm/arm.c
> ===
> --- gcc/config/arm/arm.c2019-07-01 09:37:07.220524486 +0100
> +++ gcc/config/arm/arm.c2019-07-10 11:01:57.990749932 +0100
> @@ -12471,7 +12471,7 @@ neon_expand_vector_init (rtx target, rtx
>if (n_var == 1)
>  {
>rtx copy = copy_rtx (vals);
> -  rtx index = GEN_INT (one_var);
> +  rtx merge_mask = GEN_INT (1 << one_var);
>
>/* Load constant part of vector, substitute neighboring value for
>  varying element.  */
> @@ -12480,38 +12480,7 @@ neon_expand_vector_init (rtx target, rtx
>
>/* Insert variable.  */
>x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, one_var));
> -  switch (mode)
> -   {
> -   case E_V8QImode:
> - emit_insn (gen_neon_vset_lanev8qi (target, x, target, index));
> - break;
> -   case E_V16QImode:
> - emit_insn (gen_neon_vset_lanev16qi (target, x, target, index));
> - break;
> -   case E_V4HImode:
> - emit_insn (gen_neon_vset_lanev4hi (target, x, target, index));
> - break;
> -   case E_V8HImode:
> - emit_insn (gen_neon_vset_lanev8hi (target, x, target, index));
> - break;
> -   case E_V2SImode:
> - emit_insn (gen_neon_vset_lanev2si (target, x, target, index));
> - break;
> -   case E_V4SImode:
> - emit_insn (gen_neon_vset_lanev4si (target, x, target, index));
> - break;
> -   case E_V2SFmode:
> - emit_insn 

Re: [PATCH] netbsd EABI support

2019-05-23 Thread Ramana Radhakrishnan
On Thu, May 23, 2019 at 3:12 PM Richard Earnshaw (lists)
 wrote:
>
> On 23/05/2019 15:03, Richard Earnshaw (lists) wrote:
> > On 20/05/2019 20:24, Jeff Law wrote:
> >> On 4/9/19 10:36 AM, Richard Earnshaw (lists) wrote:
> >>> On 09/04/2019 16:04, Jeff Law wrote:
>  On 4/8/19 9:17 AM, co...@sdf.org wrote:
> > Pinging again in the hope of getting the patch in, I'd like to have
> > less outstanding patches :) (I have quite a few and new releases
> > can become painful!)
> >
> > gcc/ChangeLog
> >
> > config.gcc (arm*-*-netbsdelf*) Add support for EABI configuration
> > config.host (arm*-*-netbsd*): Build driver-arm.o
> > config/arm/netbsd-eabi.h: New file.
> > config/arm/netbsd-elf.h
> > config/netbsd-elf.h: Define SUBTARGET_EXTRA_SPECS.
> >
> > libgcc/ChangeLog
> >
> > config.host (arm*-*-netbsdelf*): Add support for EABI configuration
> > config/arm/t-netbsd: LIB1ASMFUNCS: Append to existing set.
> >HOST_LIBGCC2_CFLAGS: workaround possible bug
> > config/arm/t-netbsd-eabi: New file.
>  So we're well into stage4 which means technically it's too late for
>  something like this.  However, given it's limited scope I won't object
>  if the ARM port maintainers want to go forward.  Otherwise I'll queue it
>  for gcc-10.
> 
>  jeff
> 
> >>>
> >>> I was about to approve this (modulo removing the now obsolete
> >>> FPU_DEFAULT macro), until I noticed that it also modifies the generic
> >>> NetBSD code as well.  I'm certainly not willing to approve that myself
> >>> at this late stage, but if one of the NetBSD OS maintainers wants to
> >>> step up and do so, I'll happily take the Arm back-end code as that's not
> >>> a primary or secondary target.
> >> So is removal of the FPUTYPE_DEFAULT stuff all that's needed for this to
> >> go forward now that Jason T has chimed in?
> >>
> >> jeff
> >>
> >>
> >
> > Very close.  I was just doing a last pass through the patch to make that
> > small edit when I noticed this in config/arm/netbsd-eabi.h:
> >
> >
> > #define SUBTARGET_EXTRA_ASM_SPEC  \
> >   "-matpcs ..."
> >
> > Why is the assembler unconditionally passed -matpcs for an eabi
> > configuration?  That sounds broken.
> >
> > R.
> >
>
>
> Looking at what GAS does with this flag, it simply causes the assembler
> to create an empty .arm.atpcs debug section.  On that basis, I would
> expect that it's then safe (and correct) to remove this: the EABI is not
> the ATPCS.
>

I'm not sure if this has been asked. Do you and the other authors have
a copyright assignment on file with the FSF ? I don't see any one with
the email co...@sdf.org on the copyright file . I think this has been
asked in terms of the vax netbsd patches.

regards
Ramana


> R.
>


Re: [testsuite] aarch64,arm Add missing quotes to expected error message

2019-05-20 Thread Ramana Radhakrishnan
On Mon, May 20, 2019 at 7:57 AM Christophe Lyon
 wrote:
>
> Hi,
>
> After Martin's commit r271338, we now emit quotes around reserved
> names, and some tests started to fail on aarch64 and arm.
>
> This should fix them, OK?

Looks obvious to me.

R
>
> Christophe


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Ramana Radhakrishnan
On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
 wrote:
>
> > On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
> >
> > On 5/15/19 5:19 AM, Richard Biener wrote:
> >>
> >> For the official converted repo do we really want all (old)
> >> development branches to be in the
> >> main git repo?  I suppose we could create a readonly git from the
> >> state of the whole repository
> >> at the point of conversion (and also keep the SVN in readonly mode),
> >> just to make migration
> >> of content we want easy in the future?
> > I've always assumed we'd keep the old SVN tree read-only for historical
> > purposes.  I strongly suspect that, ignoring release branches, that
> > non-active branches just aren't terribly interesting.
>
> Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
> getting community consensus to switch to git) and (2) deciding on which 
> branches to keep in the new repo.
>

I'm hoping that there is still community consensus to switch to git.

Personally speaking, a +1 to switch to git.

regards
Ramana

> With git, we can always split away unneeded history by removing unnecessary 
> branches and tags and re-packing the repo.  We can equally easily bring that 
> history back if we change our minds.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [Patch AArch64] Add __ARM_FEATURE_ATOMICS

2019-04-30 Thread Ramana Radhakrishnan
On Tue, Apr 30, 2019 at 12:38 PM Jakub Jelinek  wrote:
>
> On Tue, Apr 30, 2019 at 10:50:46AM +0100, Richard Earnshaw (lists) wrote:
> > > * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
> > > __ARM_FEATURE_ATOMICS
> > >
> > > atomics.txt
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-c.c 
> > > b/gcc/config/aarch64/aarch64-c.c
> > > index fcb1e80177d..6d5acb02fc6 100644
> > > --- a/gcc/config/aarch64/aarch64-c.c
> > > +++ b/gcc/config/aarch64/aarch64-c.c
> > > @@ -147,6 +147,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
> > >builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits);
> > >  }
> > >
> > > +  aarch64_def_or_undef (TARGET_LSE, "__ARM_FEATURE_ATOMICS", pfile);
> > >aarch64_def_or_undef (TARGET_AES, "__ARM_FEATURE_AES", pfile);
> > >aarch64_def_or_undef (TARGET_SHA2, "__ARM_FEATURE_SHA2", pfile);
> > >aarch64_def_or_undef (TARGET_SHA3, "__ARM_FEATURE_SHA3", pfile);
> > >
> >
> >
> > This is OK for trunk, 7 and 8.  For 9, I think you'll need to wait for
> > 9.2 now, unless Jakub is feeling generous...
>
> Ok if you can commit it in the next hour at most, want to do a rc2 then and
> afterwards would hope no changes till release.

Done with a very quick smoke test.

Thank you Jakub and Richard.

Ramana
>
> Jakub


Re: [Patch]Bug 89057 - [8/9/10 Regression] AArch64 ld3 st4 less optimized

2019-04-29 Thread Ramana Radhakrishnan
On Mon, Apr 29, 2019 at 8:44 AM Jaydeep Chauhan
 wrote:
>
> Hi All,
>
> The attached patch (89057.patch) fixes the subjected issue.
> Please let me know your thoughts on the patch.

Thanks for your patch.

Before getting started with reviewing the patch , the first question
is whether you have a copyright assignment on file or does your
employer have one on record with the FSF ?

Further could you elaborate more on what you have done to fix this by
providing some description other than "fix the subjected issue" and
why you have chosen the approach that you have. Finally one of the
things required for any patch to be considered is a full test run. Can
you tell us how you tested this patch ?


Ramana



>
> Thanks,
> Jaydeep.


Re: [PATCH] Fix ARM exception handling (PR target/89093)

2019-04-23 Thread Ramana Radhakrishnan
On Mon, Apr 22, 2019 at 10:15 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As detailed in the PR, unlike most other targets, on ARM EABI the floating
> point registers are saved lazily, when EH personality routine calls
> __gnu_unwind_frame (usually in the CONTINUE_UNWINDING macro).
> That means the unwinder itself and the personality routines (and whatever
> other functions those call in the path to CONTINUE_UNWINDING) must be
> compiled so that it doesn't use floating point registers.  Calling some
> function that saves those on entry and restores on exit is fine, but calling
> some function which saves those on entry and then calls __gnu_unwind_frame
> and then restores on exit is not fine.
> In 8.x and earlier we were just lucky that the RA when compiling those
> didn't decide to use any of those registers, but starting with the combiner
> hard register changes we are no longer so lucky.
>
> The following patch introduces -mgeneral-regs-only option and
> general-regs-only target attribute for ARM (similarly to how other targets
> like AArch64 and x86), changes the ARM unwinder to be compiled with that
> and changes the personality routines of all languages so that either just
> the personality routine, or whatever other routines called by personality
> routines that call directly or indirectly __gnu_unwind_frame to be compiled
> that way.
>
> Bootstrapped/regtested on armv7hl-linux-gnueabi (and x86_64-linux to make
> sure it compiles on other targets too).
>
> Ok for trunk?

Ok. Thanks a lot for working on this and fixing the rest of this up.
I've been busy with a few other things.


> While the libgo changes are included in the patch, I think those need to go
> through go upstream and will likely need some changes if that code can be
> compiled by earlier GCC versions or other compilers.
>




> Not sure about libphobos D stuff, does it need to go through upstream and
> is libdruntime/gcc/deh.d compiled by compilers other than GDC?
>
> The Ada changes need those guards because the file is compiled by both
> the system compiler and by the newly built compilers; when compiled by
> system compiler, as the FE is built with -fno-exceptions I'd hope the EH
> stuff isn't really used there and at least until GCC 9.1 is released we have
> the issue that the system compiler could be some earlier GCC 9.0.1 snapshot
> which doesn't support general-regs-only.

I don't grok enough ada to approve this and will leave that part to
Eric or others.

Ramana

>
> 2019-04-22  Ramana Radhakrishnan  
> Bernd Edlinger  
> Jakub Jelinek  
>
> PR target/89093
> * config/arm/arm.c (aapcs_vfp_is_call_or_return_candidate): Diagnose
> if used with general-regs-only.
> (arm_conditional_register_usage): Don't add non-general regs if
> general-regs-only.
> (arm_valid_target_attribute_rec): Handle general-regs-only.
> * config/arm/arm.h (TARGET_HARD_FLOAT): Return false if
> general-regs-only.
> (TARGET_HARD_FLOAT_SUB): Define.
> (TARGET_SOFT_FLOAT): Define as negation of TARGET_HARD_FLOAT_SUB.
> (TARGET_REALLY_IWMMXT): Add && !TARGET_GENERAL_REGS_ONLY.
> (TARGET_REALLY_IWMMXT2): Likewise.
> * config/arm/arm.opt: Add -mgeneral-regs-only.
> * doc/extend.texi: Document ARM general-regs-only target.
> * doc/invoke.texi: Document ARM -mgeneral-regs-only.
> gcc/ada/
> * raise-gcc.c (TARGET_ATTRIBUTE): Define.
> (continue_unwind, personality_body, PERSONALITY_FUNCTION): Add
> TARGET_ATTRIBUTE.
> libgcc/
> * config/arm/pr-support.c: Add #pragma GCC 
> target("general-regs-only").
> * config/arm/unwind-arm.c: Likewise.
> * unwind-c.c (PERSONALITY_FUNCTION): Add general-regs-only target
> attribute for ARM.
> libobjc/
> * exception.c (PERSONALITY_FUNCTION): Add general-regs-only target
> attribute for ARM.
> libphobos/
> * libdruntime/gcc/deh.d: Import gcc.attribute.
> (personality_fn_attributes): New enum.
> (scanLSDA, CONTINUE_UNWINDING, gdc_personality, __gdc_personality):
> Add @personality_fn_attributes.
> libstdc++-v3/
> * libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): Add
> general-regs-only target attribute for ARM.
> libgo/
> * runtime/go-unwind.c (PERSONALITY_FUNCTION,
> __gccgo_personality_dummy): Add general-regs-only target
> attribute for ARM.
>
> --- gcc/config/arm/arm.c(revision 270444)
> +++ gcc/config/arm/arm.c(working copy)
> @@ -6112,6 +6112,11 @@ aapcs_vfp_is_call_or_return_candidate (enum arm_pc
>  return false;
>
>*base_m

Re: [PATCH] Fix up ARM target attribute handling (PR target/89093)

2019-04-12 Thread Ramana Radhakrishnan

On 12/04/2019 15:12, Jakub Jelinek wrote:

Hi!

Just something I've noticed while looking at Ramana's patch.

As can be seen on the testcase, on arm we accept arbitrary garbage
after arm or thumb prefixes, is that really desirable?
While for fpu= or arch= we reject garbage after it and so do for
target attribute arg starting with +.

Ok if this passes bootstrap/regtest?



Bah, that's not supposed to be there, Yes that's ok .


Note, I don't understand the while (ISSPACE (*q)) ++q; there (aarch64
does the same), do we really want to support
__attribute__((target ("   arm"))) ?
Looked at other targets and can't see anything like that being supported
elsewhere.


No, that's not right. we should get rid of this.

Thanks a lot for fixing this up Jakub.

Ramana



2019-04-12  Jakub Jelinek  

PR target/89093
* config/arm/arm.c (arm_valid_target_attribute_rec): Use strcmp
instead of strncmp when checking for thumb and arm.  Formatting fixes.

* gcc.target/arm/pr89093.c: New test.

--- gcc/config/arm/arm.c.jj 2019-04-09 15:18:37.879816537 +0200
+++ gcc/config/arm/arm.c2019-04-12 15:36:36.993102230 +0200
@@ -30874,16 +30874,16 @@ arm_valid_target_attribute_rec (tree arg
while (ISSPACE (*q)) ++q;
  
argstr = NULL;

-  if (!strncmp (q, "thumb", 5))
- opts->x_target_flags |= MASK_THUMB;
+  if (!strcmp (q, "thumb"))
+   opts->x_target_flags |= MASK_THUMB;
  
-  else if (!strncmp (q, "arm", 3))

- opts->x_target_flags &= ~MASK_THUMB;
+  else if (!strcmp (q, "arm"))
+   opts->x_target_flags &= ~MASK_THUMB;
  
else if (!strncmp (q, "fpu=", 4))

{
  int fpu_index;
- if (! opt_enum_arg_to_value (OPT_mfpu_, q+4,
+ if (! opt_enum_arg_to_value (OPT_mfpu_, q + 4,
   _index, CL_TARGET))
{
  error ("invalid fpu for target attribute or pragma %qs", q);
@@ -30901,7 +30901,7 @@ arm_valid_target_attribute_rec (tree arg
}
else if (!strncmp (q, "arch=", 5))
{
- char* arch = q+5;
+ char *arch = q + 5;
  const arch_option *arm_selected_arch
 = arm_parse_arch_option_name (all_architectures, "arch", arch);
  
--- gcc/testsuite/gcc.target/arm/pr89093.c.jj	2019-04-12 16:05:47.477069147 +0200

+++ gcc/testsuite/gcc.target/arm/pr89093.c  2019-04-12 16:05:15.948591951 
+0200
@@ -0,0 +1,7 @@
+/* PR target/89093 */
+/* { dg-do compile } */
+
+__attribute__((target ("arm.foobar"))) void f1 (void) {} /* { dg-error "unknown 
target attribute or pragma 'arm.foobar'" } */
+__attribute__((target ("thumbozoo1"))) void f2 (void) {} /* { dg-error "unknown 
target attribute or pragma 'thumbozoo1'" } */
+__attribute__((target ("arm,thumbique"))) void f3 (void) {} /* { dg-error "unknown 
target attribute or pragma 'thumbique'" } */
+__attribute__((target ("thumb981,arm"))) void f4 (void) {} /* { dg-error "unknown 
target attribute or pragma 'thumb981'" } */

Jakub





[Patch AArch64] Add __ARM_FEATURE_ATOMICS

2019-04-09 Thread Ramana Radhakrishnan
This keeps coming up repeatedly and the ACLE has finally added 
__ARM_FEATURE_ATOMICS for the LSE feature in GCC. This is now part of 
the latest ACLE release 
(https://developer.arm.com/docs/101028/latest/5-feature-test-macros)


I know it's late for GCC-9 but this is a simple macro which need not 
wait  for another year.


Ok for trunk and to backport to all release branches ?

Tested with a simple build and a smoke test.


regards
Ramana

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define 
__ARM_FEATURE_ATOMICS
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index fcb1e80177d..6d5acb02fc6 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -147,6 +147,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits);
 }
 
+  aarch64_def_or_undef (TARGET_LSE, "__ARM_FEATURE_ATOMICS", pfile);
   aarch64_def_or_undef (TARGET_AES, "__ARM_FEATURE_AES", pfile);
   aarch64_def_or_undef (TARGET_SHA2, "__ARM_FEATURE_SHA2", pfile);
   aarch64_def_or_undef (TARGET_SHA3, "__ARM_FEATURE_SHA3", pfile);


Re: [PATCH] ARM cmpsi2_addneg fix follow-up (PR target/89506)

2019-03-19 Thread Ramana Radhakrishnan
On 04/03/2019 09:04, Jakub Jelinek wrote:
> On Fri, Mar 01, 2019 at 03:41:33PM +, Wilco Dijkstra wrote:
>> > and regtest revealed two code size
>> > regressions because of that.  Is -1 vs. 1 the only case of immediate
>> > valid for both "I" and "L" constraints where the former is longer than the
>> > latter?
>> 
>> Yes -1 is the only case which can result in larger code on Thumb-2, so -1 
>> should
>> be disallowed by the I constraint (or even better, the underlying query). 
>> That way
>> it will work correctly for all add/sub patterns, not just this one.
> 
> So, over the weekend I've bootstrapped/regtested on armv7hl-linux-gnueabi
> following two possible follow-ups which handle the -1 and 1 cases right
> (prefer the instruction with #1 for thumb2), 0 and INT_MIN (use subs) and
> for others use subs if both constraints match, otherwise adds.
> 
> The first one uses constraints and no C code in the output, I believe it is
> actually more expensive for compile time, because if one just reads what
> constrain_operands needs to do for another constraint, it is quite a lot.
> I've tried to at least not introduce new constraints for this, there is no
> constraint for number 1 (or for number -1).
> The Pu constraint is thumb2 only for numbers 1..8, and the alternative uses
> I constraint for the negation of it, i.e. -8..-1, only -1 from this is
> valid for I.  If that matches, we emit adds with #1, otherwise just prefer
> subs over adds.
> 
> The other swaps the alternatives similarly to the above, but for the special
> case of desirable adds with #1 uses C code instead of another alternative.
> 
> Ok for trunk (which one)?
> 
>      Jakub


Option #2 is better (the C code variant)- for something like this adding 
one more constraint is a bit painful.

Ok and watch out for any regressions as usual.

Ramana


Re: [GCC, Arm, committed] Fix availability of FP16-FP64 conversion instructions

2019-03-11 Thread Ramana Radhakrishnan
Nope, just do it after testing it and adjust with Christophes follow up

R

On Mon, 11 Mar 2019, 10:36 Andre Vieira (lists), <
andre.simoesdiasvie...@arm.com> wrote:

> Hi,
>
> Any objections to me backporting this to GCC 8 and 7?
>
> Cheers,
> Andre
>
> On 08/03/2019 17:30, Andre Vieira (lists) wrote:
> > Hi,
> >
> > vcvtb.f16.f64 and vcvtb.f64.f16 were being made available even for FPUs
> > that do not support double precision.  This patch fixes that.
> >
> > Regression tested for arm-none-eabi.
> >
> > Committed in r269499.
> >
> > Cheers,
> > Andre
> >
> > gcc/ChangeLog:
> > 2019-03-08  Andre Vieira  
> >
> >  * config/arm/arm.h (TARGET_FP16_TO_DOUBLE): Add
> TARGET_VFP_DOUBLE
> >  requirement.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2019-03-08  Andre Vieira  
> >
> >  * gcc.target/arm/f16_f64_conv_no_dp.c: New test.
>


Re: [GCC, Arm, committed] Fix availability of FP16-FP64 conversion instructions

2019-03-11 Thread Ramana Radhakrishnan
Ok.

Ramana

On Mon, 11 Mar 2019, 20:24 Christophe Lyon, 
wrote:

> On Mon, 11 Mar 2019 at 12:34, Richard Biener  wrote:
> >
> > On Mon, 11 Mar 2019, Andre Vieira (lists) wrote:
> >
> > > Hi,
> > >
> > > Any objections to me backporting this to GCC 8 and 7?
> >
> > No, go ahead (after proper testing).
> >
>
> Hi,
>
> I've noticed that this new test fails on arm-none-linux-gnueabi
> --with-mode thumb
> --with-cpu cortex-a9
> --with-fpu default
>
>  and with Dejagnu flags: -march=armv5t
>
> (because the test forces float-abi=hard on a target that generates
> thumb-1 by default, which isn't supported.
>
> The attached patch fixes this by adding arm_fp16_ok effective target.
> OK?
>
>
> Christophe
>
>
> > Richard.
> >
> > > Cheers,
> > > Andre
> > >
> > > On 08/03/2019 17:30, Andre Vieira (lists) wrote:
> > > > Hi,
> > > >
> > > > vcvtb.f16.f64 and vcvtb.f64.f16 were being made available even for
> FPUs that
> > > > do not support double precision.  This patch fixes that.
> > > >
> > > > Regression tested for arm-none-eabi.
> > > >
> > > > Committed in r269499.
> > > >
> > > > Cheers,
> > > > Andre
> > > >
> > > > gcc/ChangeLog:
> > > > 2019-03-08  Andre Vieira  
> > > >
> > > >  * config/arm/arm.h (TARGET_FP16_TO_DOUBLE): Add
> TARGET_VFP_DOUBLE
> > > >  requirement.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 2019-03-08  Andre Vieira  
> > > >
> > > >  * gcc.target/arm/f16_f64_conv_no_dp.c: New test.
> > >
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nuernberg)
>


Re: [PATCH][ARM] Fix PR89222

2019-03-05 Thread Ramana Radhakrishnan

On 05/03/2019 12:33, Wilco Dijkstra wrote:



ping



From: Wilco Dijkstra
Sent: 13 February 2019 12:23
To: Ramana Radhakrishnan
Cc: GCC Patches; nd; Olivier Hainque
Subject: Re: [PATCH][ARM] Fix PR89222
   


Hi Ramana,


ARMv5te bootstrap OK, regression tests pass. OK for commit?


Interesting bug. armv5te-linux bootstrap ? Can you share your --target
and --with-arch flags ?


--target/host/build=arm-linux-gnueabi --with-arch=armv5te  --with-mode=arm


+  if (GET_CODE (base) == SYMBOL_REF)


Isn't there a SYMBOL_REF_P predicate for this ?


Yes, I've changed this one, but this really should be done as a cleanup across
Arm and AArch64 given there are 100 occurrences that need to be fixed.


Can we look to allow anything that is a power of 2 as an offset i.e.
anything with bit 0 set to 0 ? Could you please file an enhancement
request on binutils for both gold and ld to catch the linker warning
case ? I suspect we are looking for addends which have the lower bit
set and function symbols ?


I don't think it is useful optimization to allow an offset on function symbols.
A linker warning would be useful indeed, I'll file an enhancement request.


Firstly (targetm.cannot_force_const_mem (...)) please instead of
arm_cannot_force_const_mem , then that can remain static.  Let's look
to use the targetm interface instead of direct calls here. We weren't


I've changed it to use targetm in the new version.


hitting this path for non-vxworks code , however now we do so if
arm_tls_referenced_p is true at the end of arm_cannot_force_const_mem
which means that we could well have a TLS address getting spat out or
am I mis-reading something ?


Yes there was a path where we could end up in an endless expansion loop
if arm_tls_referenced_p is true. I fixed this by checking the offset is nonzero
before expanding. This also allowed a major simplification of the TLS code
which was trying to do the same thing.


Can Olivier or someone who cares about vxworks also give this a quick
sanity run for the alternate code path once we resolve some of the
review questions here ? Don't we also need to worry about
-mslow-flash-data where we get rid of literal pools and have movw /
movt instructions ?


Splitting the offset early means it works fine for MOVW/MOVT. Eg before
my change -mcpu=cortex-m3 -mslow-flash-data:

f3:
     movw    r3, #:lower16:handler-1
     movt    r3, #:upper16:handler-1

After:
     movw    r3, #:lower16:handler
     movt    r3, #:upper16:handler
     subs    r3, r3, #1


Here is the updated version:

The GCC optimizer can generate symbols with non-zero offset from simple
if-statements. Bit zero is used for the Arm/Thumb state bit, so relocations
with offsets fail if it changes bit zero and the relocation forces bit zero
to true.  The fix is to disable offsets on function pointer symbols.

ARMv5te and armhf bootstrap OK, regression tests pass. OK for commit?

ChangeLog:
2019-02-12  Wilco Dijkstra  

     gcc/
     PR target/89222
     * config/arm/arm.md (movsi): Use targetm.cannot_force_const_mem
     to decide when to split off a non-zero offset from a symbol.
     * config/arm/arm.c (arm_cannot_force_const_mem): Disallow offsets
     in function symbols.

     testsuite/
     PR target/89222
     * gcc.target/arm/pr89222.c: Add new test.
---
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f07f4cc47b6cfcea8f44960bf4760ea9a46b8f87..69b74a237a5f10b4137aa995ad43b77d6ecd04db
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8940,11 +8940,16 @@ static bool
  arm_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
  {
    rtx base, offset;
+  split_const (x, , );
  
-  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)

+  if (SYMBOL_REF_P (base))
  {
-  split_const (x, , );
-  if (GET_CODE (base) == SYMBOL_REF
+  /* Function symbols cannot have an offset due to the Thumb bit.  */
+  if ((SYMBOL_REF_FLAGS (base) & SYMBOL_FLAG_FUNCTION)
+ && INTVAL (offset) != 0)
+   return true;
+
+  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
    && !offset_within_block_p (base, INTVAL (offset)))
  return true;
  }
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
aa759624f8f617576773aa75fd6239d6e06e8a13..88110cb732b52866d4fdcad70fd5a202aa62bd03
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5981,53 +5981,29 @@ (define_expand "movsi"
  }
  }
  
-  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)

+  split_const (operands[1], , );
+  if (INTVAL (offset) != 0
+  && targetm.cannot_force_const_mem (SImode, operands[1]))
  {
-  split_const (operands[1], , );
-  if (GET_CODE (base) == SYMBOL_REF
- && !offset_within_block_p (base, INTVAL (offset)))
-   {
- tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
- emit_move_insn (tmp, base);
- e

Re: arm access to stack slot out of allocated area

2019-02-11 Thread Ramana Radhakrishnan
On Mon, Feb 11, 2019 at 4:48 PM Olivier Hainque  wrote:
>
> Hi Wilco,
>
> > On 8 Feb 2019, at 22:35, Wilco Dijkstra  wrote:
>
> > So I think we need to push much harder on getting rid of obsolete stuff and
> > avoid people encountering these nasty issues.
>
> Numbers I just received indicate that we can legitimately head
> in this direction for VxWorks as well (move towards VxWorks 7 only
> ports, AAPCS based).
>
> Good news :)
>

Yay !

Ramana


Re: [PATCH][ARM] Fix PR89222

2019-02-11 Thread Ramana Radhakrishnan
On Mon, Feb 11, 2019 at 5:35 PM Wilco Dijkstra  wrote:
>
> The GCC optimizer can generate symbols with non-zero offset from simple
> if-statements. Bit zero is used for the Arm/Thumb state bit, so relocations
> with offsets fail if it changes bit zero and the relocation forces bit zero
> to true.  The fix is to disable offsets on function pointer symbols.
>
> ARMv5te bootstrap OK, regression tests pass. OK for commit?

Interesting bug. armv5te-linux bootstrap ? Can you share your --target
and --with-arch flags ?

>
> ChangeLog:
> 2019-02-06  Wilco Dijkstra  
>
> gcc/
> PR target/89222
> * config/arm/arm.md (movsi): Use arm_cannot_force_const_mem
> to decide when to split off an offset from a symbol.
> * config/arm/arm.c (arm_cannot_force_const_mem): Disallow offsets
> in function symbols.
> * config/arm/arm-protos.h (arm_cannot_force_const_mem): Add.
>
> testsuite/
> PR target/89222
> * gcc.target/arm/pr89222.c: Add new test.
>
> --
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 
> 79ede0db174fcce87abe8b4d18893550d4c7e2f6..0bedbe5110853617ecf7456bbaa56b1405fb65dd
>  100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -184,6 +184,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
> tree, rtx, tree);
>  extern bool arm_pad_reg_upward (machine_mode, tree, int);
>  #endif
>  extern int arm_apply_result_size (void);
> +extern bool arm_cannot_force_const_mem (machine_mode, rtx);
>
>  #endif /* RTX_CODE */
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> c4c9b4a667100d81d918196713e40b01ee232ee2..ccd4211045066d8edb89dd4c23d554517639f8f6
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -178,7 +178,6 @@ static void arm_internal_label (FILE *, const char *, 
> unsigned long);
>  static void arm_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT,
>  tree);
>  static bool arm_have_conditional_execution (void);
> -static bool arm_cannot_force_const_mem (machine_mode, rtx);
>  static bool arm_legitimate_constant_p (machine_mode, rtx);
>  static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, bool);
>  static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
> @@ -8936,15 +8935,20 @@ arm_legitimate_constant_p (machine_mode mode, rtx x)
>
>  /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
>
> -static bool

Let's keep this static ...

> +bool
>  arm_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
>rtx base, offset;
> +  split_const (x, , );
>
> -  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)
> +  if (GET_CODE (base) == SYMBOL_REF)

Isn't there a SYMBOL_REF_P predicate for this ?

>  {
> -  split_const (x, , );
> -  if (GET_CODE (base) == SYMBOL_REF
> +  /* Function symbols cannot have an offset due to the Thumb bit.  */
> +  if ((SYMBOL_REF_FLAGS (base) & SYMBOL_FLAG_FUNCTION)
> + && INTVAL (offset) != 0)
> +   return true;
> +

Can we look to allow anything that is a power of 2 as an offset i.e.
anything with bit 0 set to 0 ? Could you please file an enhancement
request on binutils for both gold and ld to catch the linker warning
case ? I suspect we are looking for addends which have the lower bit
set and function symbols ?


> +  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
>   && !offset_within_block_p (base, INTVAL (offset)))
> return true;
>  }

this looks ok.

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> aa759624f8f617576773aa75fd6239d6e06e8a13..00fccd964a86dd814f15e4a1fdf5b47173a3ee3f
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5981,17 +5981,13 @@ (define_expand "movsi"
>  }
>  }
>
> -  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)
> +  if (arm_cannot_force_const_mem (SImode, operands[1]))

Firstly (targetm.cannot_force_const_mem (...)) please instead of
arm_cannot_force_const_mem , then that can remain static.  Let's look
to use the targetm interface instead of direct calls here. We weren't
hitting this path for non-vxworks code , however now we do so if
arm_tls_referenced_p is true at the end of arm_cannot_force_const_mem
which means that we could well have a TLS address getting spat out or
am I mis-reading something ?

This is my main concern with this patch ..

>  {
>split_const (operands[1], , );

> -  if (GET_CODE (base) == SYMBOL_REF
> - && !offset_within_block_p (base, INTVAL (offset)))
> -   {
> - tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> - emit_move_insn (tmp, base);
> - emit_insn (gen_addsi3 (operands[0], tmp, offset));
> - DONE;
> -   }
> +  tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> +  emit_move_insn (tmp, base);
> +  emit_insn (gen_addsi3 (operands[0], tmp, offset));
> +  

Re: arm access to stack slot out of allocated area

2019-02-08 Thread Ramana Radhakrishnan
On 08/02/2019 16:19, Olivier Hainque wrote:
> Hi Wilco,
> 
>> On 8 Feb 2019, at 15:49, Wilco Dijkstra  wrote:
>>
>> Hi Olivier,
>>
>>> Below is a description of a very annoying bug we are witnessing
>>> on ARM.
>> ...
>>> compiled with -Og -mapcs
>>
>> Do you know -mapcs has been deprecated for more than 4 years now?
>> Is there a reason you are still using it? It was deprecated since -mapcs
>> is both extremely inefficient and buggy. I would strongly suggest to stop
>> using this option completely in all your software.
> 
> Sorry, I had -mapcs-frame in mind. The investigation I described was
> conducted with the default settings of the VxWorks port (no particular option)
> and thought it might be of greater potential interest if exposed with a bare
> eabi compiler. The conditions of interest are guarded with
> 
>   IS_NESTED (arm_current_func_type ())
> && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
> 
> Turns out that the arm-eabi code with -mapcs-frame is very slightly
> different than the vxworks one, which I missed on the first reading.
> 
> Let me double check again ...
> 
>> It's useful to create a bug report with the details, however it might not be
>> worth fixing if the issue cannot be reproduced without -mapcs. -mapcs
>> was never meant to support a static chain, especially not spills of the
>> static chain.
> 
> Understood. I'll start by examining the differences
> between the vxworks and the eabi+aapcs-frame configurations
> more closely, and I'll report back.

Olivier, while you are here could you also document the choices made by
the vxworks port in terms of the ABI and how it differs from EABI ? It
would certainly help with patch review.


regards
Ramana




> 
> Thanks for your feedback!
> 



Re: [PATCH][wwwdocs][Arm][AArch64] Update changes with new features and flags.

2019-01-31 Thread Ramana Radhakrishnan
On Thu, 31 Jan 2019, 10:09 Tamar Christina  Hi Gerard,
>
> Thanks I'll make the suggested changes.
>
> About the duplication, we can avoid it if we have a "general" section for
> all Arm ports first and then subsections for Arm and AArch64 specific
> things.
>
> I'm not sure how the maintainers feel about such a re-organization though.
>

Works for me

R

>
> Any opinions guys?
>
> Thanks,
> Tamar
>
> 
> From: Gerald Pfeifer 
> Sent: Thursday, January 31, 2019 12:29 AM
> To: Tamar Christina
> Cc: gcc-patches@gcc.gnu.org; nd; James Greenhalgh; Richard Earnshaw;
> Marcus Shawcroft; Ramana Radhakrishnan; ni...@redhat.com; Kyrylo Tkachov
> Subject: Re: [PATCH][wwwdocs][Arm][AArch64] Update changes with new
> features and flags.
>
> On Wed, 23 Jan 2019, Tamar Christina wrote:
> > This patch adds the documentation for Stack clash protection and
> > Armv8.3-a support to changes.html for GCC 9.
>
> Some additional notes, all minor, for consideration before you commit.
>
> +The probing interval/guard size can be set by using
> +--param stack-clash-protection-guard-size=12|16.
> +The value of this parameter must be in bytes represented as a power
> of two.
> +The only two supported values for this parameter are 12 and 16 being
> +4Kb (2^12) and 64Kb (2^16) respectively.
>
> This one keeps making me think every time I read it.  What do you
> think of changing the second and third sentences to
>
>   "The two supported values for this paramter are 12 (for a 4KiB size,
>   2^12) and 16 (for a 64KiB size, 2^16)."
>
> or something like that?  Shorter and about the same contents?  (Note,
> uppercase B or we'd refer to bits.)
>
> +The Armv8.3-A complex number instructions are now supported via
> intrinsics
> +when the option -march=armv8.3-a or equivalent is
> specified.
> +For the half-precision floating-point variants of these instructions
> use the
> +architecture extension flag +fp16, e.g.
> +-march=armv8.3-a+fp16.
> +
> +The intrinsics are defined by the ACLE specification.
>
> Note that these two visual paragraphs in HTML source will be merged into
> just one unless you add ... around the two.  Just pointing it out.
>
> +  
> +The Armv8.3-A complex number instructions are now supported via
> intrinsics
> +when the option -march=armv8.3-a or equivalent is
> specified.
> +For the half-precision floating-point variants of these instructions
> use the
> +architecture extension flag +fp16, e.g.
> +-march=armv8.3-a+fp16.
> +
> +The intrinsics are defined by the ACLE specification.
> +  
>
> I guess this duplication is hard to avoid between Arm and AArch64?
>
> Gerald
>


Re: [PATCH] PR c++/87893 - constexpr ctor ICE on ARM.

2019-01-23 Thread Ramana Radhakrishnan
On Wed, Jan 23, 2019 at 1:54 PM Jason Merrill  wrote:
>
> Since in r265788 I made cxx_eval_outermost_constant_expr more insistent that
> the returned value have the right type, it became more important that
> initialized_type be correct.  These two PRs were cases of it giving the wrong
> answer.  On ARM, a constructor returns a pointer to the object, but
> initialized_type should return the class type, not that pointer type.  And we
> need to look through COMPOUND_EXPR.
>
> I tested that this fixes one of the affected testcases on ARM, and causes no
> regressions on x86_64-pc-linux-gnu.  I haven't been able to get a cross
> toolchain to work properly; currently link tests are failing with
> undefined __sync_synchronize.  Applying to trunk.
>

rearnsha pointed this out to me - You probably need this with newlib
and it looks like the patch dropped between the cracks :(

https://sourceware.org/ml/newlib/2015/msg00653.html

I'll try and dust this off in the coming week.

Ramana



> PR c++/88293 - ICE with comma expression.
> * constexpr.c (initialized_type): Don't shortcut non-void type.
> Handle COMPOUND_EXPR.
> (cxx_eval_outermost_constant_expr): Return early for void type.
> ---
>  gcc/cp/constexpr.c| 8 +---
>  gcc/testsuite/g++.dg/cpp0x/constexpr-comma1.C | 9 +
>  gcc/cp/ChangeLog  | 8 
>  3 files changed, 22 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-comma1.C
>
> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> index ed4bbeeb157..42681416760 100644
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -2848,9 +2848,7 @@ initialized_type (tree t)
>if (TYPE_P (t))
>  return t;
>tree type = TREE_TYPE (t);
> -  if (!VOID_TYPE_P (type))
> -/* No need to look deeper.  */;
> -  else if (TREE_CODE (t) == CALL_EXPR)
> +  if (TREE_CODE (t) == CALL_EXPR)
>  {
>/* A constructor call has void type, so we need to look deeper.  */
>tree fn = get_function_named_in_call (t);
> @@ -2858,6 +2856,8 @@ initialized_type (tree t)
>   && DECL_CXX_CONSTRUCTOR_P (fn))
> type = DECL_CONTEXT (fn);
>  }
> +  else if (TREE_CODE (t) == COMPOUND_EXPR)
> +return initialized_type (TREE_OPERAND (t, 1));
>else if (TREE_CODE (t) == AGGR_INIT_EXPR)
>  type = TREE_TYPE (AGGR_INIT_EXPR_SLOT (t));
>return cv_unqualified (type);
> @@ -5061,6 +5061,8 @@ cxx_eval_outermost_constant_expr (tree t, bool 
> allow_non_constant,
>
>tree type = initialized_type (t);
>tree r = t;
> +  if (VOID_TYPE_P (type))
> +return t;
>if (AGGREGATE_TYPE_P (type) || VECTOR_TYPE_P (type))
>  {
>/* In C++14 an NSDMI can participate in aggregate initialization,
> diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-comma1.C 
> b/gcc/testsuite/g++.dg/cpp0x/constexpr-comma1.C
> new file mode 100644
> index 000..9dd1299ddf4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-comma1.C
> @@ -0,0 +1,9 @@
> +// PR c++/88293
> +// { dg-do compile { target c++11 } }
> +
> +struct A
> +{
> +  constexpr A () { }
> +};
> +
> +const A  = (A (), A ());
> diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
> index 111782aeaba..20a54719578 100644
> --- a/gcc/cp/ChangeLog
> +++ b/gcc/cp/ChangeLog
> @@ -1,3 +1,11 @@
> +2019-01-21  Jason Merrill  
> +
> +   PR c++/87893 - constexpr ctor ICE on ARM.
> +   PR c++/88293 - ICE with comma expression.
> +   * constexpr.c (initialized_type): Don't shortcut non-void type.
> +   Handle COMPOUND_EXPR.
> +   (cxx_eval_outermost_constant_expr): Return early for void type.
> +
>  2019-01-21  Jakub Jelinek  
>
> PR c++/88949
>
> base-commit: feb90a0dd6fc4e12786dce8338f716253d50b545
> --
> 2.20.1
>


Re: [PATCH][GCC][Arm] Rewrite arm testcase to use intrinsics

2019-01-21 Thread Ramana Radhakrishnan
On 20/01/2019 15:48, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jan 17, 2019 at 03:02:00PM +, Tamar Christina wrote:
>> This test was added back when builtins were being used instead of ACLE
>> intrinsics.  The test as far as I can tell is really testing vcombine,
>> however some of these builtins no longer exist and causes an ICE.
>>
>> This fixes the testcase by changing it to use neon intrinsics.

JFTR, I think this was a case when we were using builtins for
implementation of the ACLE intrinsics and the testcase was reduced to
remove the use of arm_neon.h . Thus the test needs to go back to using
the neon intrinsics directly if possible.

Also if there are other tests like this it would be a good small cleanup
to do.

> 
> Shouldn't the ICE be fixed as well?  [ Sorry if you send a separate patch
> for that and I missed it ].
> 

Indeed but that's a separate issue to this.

regards
Ramana
> 
> Segher
> 



Re: [PATCH][GCC][Arm] Rewrite arm testcase to use intrinsics

2019-01-17 Thread Ramana Radhakrishnan
On 17/01/2019 15:02, Tamar Christina wrote:
> Hi All,
> 
> This test was added back when builtins were being used instead of ACLE
> intrinsics.  The test as far as I can tell is really testing vcombine,
> however some of these builtins no longer exist and causes an ICE.
> 
> This fixes the testcase by changing it to use neon intrinsics.
> 
> Regtested on arm-none-eabi and no issues.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-01-17  Tamar Christina  
> 
>   PR target/88850
>   * gcc.target/arm/pr51968.c: Use neon intrinsics.
> 

Ok.

Looks pretty obvious to me.

Ramana


Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On 03/12/2018 16:39, Ard Biesheuvel wrote:
> On Mon, 3 Dec 2018 at 10:55, Ramana Radhakrishnan
>  wrote:
>>
>> For quite sometime the kernel guys, (more specifically Ard) have been
>> talking about using a system register (sp_el0) and an offset from that
>> for a canary based access. This patchset adds support for a new set of
>> command line options similar to how powerpc has done this.
>>
>> I don't intend to change the defaults in userland, we've discussed this
>> for user-land in the past and as far as glibc and userland is concerned
>> we stick to the options as currently existing. The system register
>> option is really for the kernel to use along with an offset as they
>> control their ABI and this is a decision for them to make.
>>
>> I did consider sticking this all under a mcmodel=kernel-small option but
>> thought that would be a bit too aggressive. There is very little error
>> checking I can do in terms of the system register being used and really
>> the assembler would barf quite quickly in case things go wrong. I've
>> managed to rebuild Ard's kernel tree with an additional patch that
>> I will send to him. I haven't managed to boot this kernel.
>>
>> There was an additional question asked about the performance
>> characteristics of this but it's a security feature and the kernel
>> doesn't have the luxury of a hidden symbol. Further since the kernel
>> uses sp_el0 for access everywhere and if they choose to use the same
>> register I don't think the performance characteristics would be too bad,
>> but that's a decision for the kernel folks to make when taking in the
>> feature into the kernel.
>>
>> I still need to add some tests and documentation in invoke.texi but
>> this is at the stage where it would be nice for some other folks
>> to look at this.
>>
>> The difference in code generated is as below.
>>
>> extern void bar (char *);
>> int foo (void)
>> {
>> char a[100];
>> bar ();
>> }
>>
>> $GCC -O2  -fstack-protector-strong  vs
>> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg
>> -mstack-protector-guard-offset=1024 -fstack-protector-strong
>>
>>
>> --- tst.s   2018-12-03 09:46:21.174167443 +
>> +++ tst.s.1 2018-12-03 09:46:03.546257203 +
>> @@ -15,15 +15,14 @@
>>  mov x29, sp
>>  str x19, [sp, 16]
>>  .cfi_offset 19, -128
>> -   adrpx19, __stack_chk_guard
>> -   add x19, x19, :lo12:__stack_chk_guard
>> -   ldr x0, [x19]
>> -   str x0, [sp, 136]
>> -   mov x0,0
>> +   mrs x19, sp_el0
>>  add x0, sp, 32
>> +   ldr x1, [x19, 1024]
>> +   str x1, [sp, 136]
>> +   mov x1,0
>>  bl  bar
>>  ldr x0, [sp, 136]
>> -   ldr x1, [x19]
>> +   ldr x1, [x19, 1024]
>>  eor x1, x0, x1
>>  cbnzx1, .L5
>>
>>
>>
>>
>> I will be afk tomorrow and day after but this is to elicit some comments
>> and for Ard to try this out with his kernel patches.
>>
> 
> Thanks Ramana. I managed to build and run a complete kernel (including
> modules) on a bare metal system, and everything works as expected.
> 
> The only thing I'd like to confirm with you is the logic wrt the
> command line arguments, more specifically, if/when all 3 arguments
> have to appear, and whether they are permitted to appear if
> -fstack-protector is not set.

They are permitted to appear without -fstack-protector even though it 
doesn't make much sense ...

> 
> This is relevant given that we invoke the compiler in 3 different ways:
> - at the configure stage, we invoke the compiler with some/all of
> these options to decide whether the feature is supported, but the
> actual offset is not known, but also irrelevant
> - we invoke the compiler to build the header file that actually gives
> us the offset to pass to later invocations
> - finally, all kernel objects are built with all 3 arguments passed on
> the command line
> 
> It looks like your code permits -mstack-protector-guard-reg at any
> time, but only permits -mstack-protector-guard-offset if
> -mstack-protector-guard is set to sysreg (and thus set explicitly,
> since the default is global). Is that intentional? Can we expect this
> to remain like that?

It doesn't make sense to permit an offset if the stack protector guard 
is a global variable.


If the default changes to sysreg which I doubt, then I would expect 
-mstack-protector-guard-offset to be useable without 
-mstack-protector-guard=sysreg . However changing the default is not 
something I'm sure we have the appetite for yet in user land. The 
decision was made in 2015 that for user land the stack protector guard 
would be a hidden symbol and I expect there to be quite a lot of 
protracted discussion before changing this.


regards
Ramana


> 



Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On 10/01/2019 15:49, James Greenhalgh wrote:
> On Mon, Dec 03, 2018 at 03:55:36AM -0600, Ramana Radhakrishnan wrote:
>> For quite sometime the kernel guys, (more specifically Ard) have been
>> talking about using a system register (sp_el0) and an offset from that
>> for a canary based access. This patchset adds support for a new set of
>> command line options similar to how powerpc has done this.
>>
>> I don't intend to change the defaults in userland, we've discussed this
>> for user-land in the past and as far as glibc and userland is concerned
>> we stick to the options as currently existing. The system register
>> option is really for the kernel to use along with an offset as they
>> control their ABI and this is a decision for them to make.
>>
>> I did consider sticking this all under a mcmodel=kernel-small option but
>> thought that would be a bit too aggressive. There is very little error
>> checking I can do in terms of the system register being used and really
>> the assembler would barf quite quickly in case things go wrong. I've
>> managed to rebuild Ard's kernel tree with an additional patch that
>> I will send to him. I haven't managed to boot this kernel.
>>
>> There was an additional question asked about the performance
>> characteristics of this but it's a security feature and the kernel
>> doesn't have the luxury of a hidden symbol. Further since the kernel
>> uses sp_el0 for access everywhere and if they choose to use the same
>> register I don't think the performance characteristics would be too bad,
>> but that's a decision for the kernel folks to make when taking in the
>> feature into the kernel.
>>
>> I still need to add some tests and documentation in invoke.texi but
>> this is at the stage where it would be nice for some other folks
>> to look at this.
>>
>> The difference in code generated is as below.
>>
>> extern void bar (char *);
>> int foo (void)
>> {
>> char a[100];
>> bar ();
>> }
>>
>> $GCC -O2  -fstack-protector-strong  vs
>> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg
>> -mstack-protector-guard-offset=1024 -fstack-protector-strong
>>
>>  
>> --- tst.s2018-12-03 09:46:21.174167443 +
>> +++ tst.s.1  2018-12-03 09:46:03.546257203 +
>> @@ -15,15 +15,14 @@
>>  mov x29, sp
>>  str x19, [sp, 16]
>>  .cfi_offset 19, -128
>> -adrpx19, __stack_chk_guard
>> -add x19, x19, :lo12:__stack_chk_guard
>> -ldr x0, [x19]
>> -str x0, [sp, 136]
>> -mov x0,0
>> +mrs x19, sp_el0
>>  add x0, sp, 32
>> +ldr x1, [x19, 1024]
>> +str x1, [sp, 136]
>> +mov x1,0
>>  bl  bar
>>  ldr x0, [sp, 136]
>> -ldr x1, [x19]
>> +ldr x1, [x19, 1024]
>>  eor x1, x0, x1
>>  cbnzx1, .L5
>>
>>
>>
>>
>> I will be afk tomorrow and day after but this is to elicit some comments
>> and for Ard to try this out with his kernel patches.
>>
>> Thoughts ?
> 
> I didn't see ananswer on list to Ard's questions about the command-line logic.

Ah I must have missed that - will take that up separately.

> Remember to also fix up the error message concerns Florian raised.
> 


> That said, if Jakub is happy with this in Stage 4, I am too.
> 
> My biggest concern is the -mstack-protector-guard-reg interface, which
> is unchecked user input and so opens up nasty ways to force the compiler
> towards out of bounds accesses (e.g.
> -mstack-protector-guard-reg="What memory is at %10")
> 

-mstack-protector-guard-reg is fine - it's a system register , if the 
assembler doesn't recognize it , it will barf.

-mstack-protector-guard-offset= I assume is what you are 
concerned about. I don't have a good answer to that one and am going to 
chicken out and say this is the same interface as x86 and power and 
while I accept it's an access to any location, the user can still do 
that with a C program and any arbitrary inline asm :-/



regards
Ramana

> Thanks,
> James
> 
>>
>> regards
>> Ramana
>>
>> gcc/ChangeLog:
>>
>> 2018-11-23  Ramana Radhakrishnan  
>>
>>   * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
>>   * config/aarch64/aarch64.c (aarch64_override_options_internal):
>> Handle
>>   and put in error checks for stack protector guard options.
>>   (aarch64_stack_protect_guard): New.
>>   (TARGET_STACK_PROTECT_GUARD): Define.
>>   * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
>>   (reg_stack_protect_address): New.
>>   (stack_protect_set): Adjust for SSP_GLOBAL.
>>   (stack_protect_test): Likewise.
>>   * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
>>   (-mstack-protector-guard): Likewise.
>>   (-mstack-protector-guard-offset): Likewise.
>>   * doc/invoke.texi: Document new AArch64 options.
> 



Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On Thu, Jan 10, 2019 at 11:05 AM Jakub Jelinek  wrote:
>
> On Thu, Jan 10, 2019 at 10:53:32AM +, Ramana Radhakrishnan wrote:
> > > 2018-11-23  Ramana Radhakrishnan  
> > >
> > >  * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
> > >  * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > > Handle
> > >  and put in error checks for stack protector guard options.
> > >  (aarch64_stack_protect_guard): New.
> > >  (TARGET_STACK_PROTECT_GUARD): Define.
> > >  * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
> > >  (reg_stack_protect_address): New.
> > >  (stack_protect_set): Adjust for SSP_GLOBAL.
> > >  (stack_protect_test): Likewise.
> > >  * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
> > >  (-mstack-protector-guard): Likewise.
> > >  (-mstack-protector-guard-offset): Likewise.
> > >  * doc/invoke.texi: Document new AArch64 options.
> >
> > Any further thoughts or is it just Jakub's comments that I need to
> > address on this patch ? It looks like the kernel folks have queued
> > this for the next kernel release and given this is helping the kernel
> > with a security feature, can we move this forward ?
>
> From RM POV this is ok in stage4 if you commit it RSN.
> Both x86 and powerpc have -mstack-protector-guard{,-reg,-offset}= options,
> x86 even has -mstack-protector-guard-symbol=.  So it would be nice if the
> aarch64 options are compatible with those other arches.
>

Thanks Jakub. I haven't added the -mstack-protector-guard-symbol as
there is no requirement to do so now and I don't want to add an option
that isn't being used. IIRC, the other options seem to be in sync with
x86 and powerpc.

> Please make sure you don't regress non-glibc SSP support (don't repeat
> PR85644/PR86832).
>

That should be ok as I'm not changing any defaults. I would expect
that non-glibc based libraries that support SSP must be mimicking
glibc support for this using the global symbol as there is nothing
special in the backend for this today. I guess there is freebsd as a
non-glibc target or musl that I can look at but I don't expect that to
be an issue.

I'll wait until tomorrow to respin just to see if I can get any
further feedback.

regards
Ramana



> Jakub


Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On Mon, Dec 3, 2018 at 9:55 AM Ramana Radhakrishnan
 wrote:
>
> For quite sometime the kernel guys, (more specifically Ard) have been
> talking about using a system register (sp_el0) and an offset from that
> for a canary based access. This patchset adds support for a new set of
> command line options similar to how powerpc has done this.
>
> I don't intend to change the defaults in userland, we've discussed this
> for user-land in the past and as far as glibc and userland is concerned
> we stick to the options as currently existing. The system register
> option is really for the kernel to use along with an offset as they
> control their ABI and this is a decision for them to make.
>
> I did consider sticking this all under a mcmodel=kernel-small option but
> thought that would be a bit too aggressive. There is very little error
> checking I can do in terms of the system register being used and really
> the assembler would barf quite quickly in case things go wrong. I've
> managed to rebuild Ard's kernel tree with an additional patch that
> I will send to him. I haven't managed to boot this kernel.
>
> There was an additional question asked about the performance
> characteristics of this but it's a security feature and the kernel
> doesn't have the luxury of a hidden symbol. Further since the kernel
> uses sp_el0 for access everywhere and if they choose to use the same
> register I don't think the performance characteristics would be too bad,
> but that's a decision for the kernel folks to make when taking in the
> feature into the kernel.
>
> I still need to add some tests and documentation in invoke.texi but
> this is at the stage where it would be nice for some other folks
> to look at this.
>
> The difference in code generated is as below.
>
> extern void bar (char *);
> int foo (void)
> {
>char a[100];
>bar ();
> }
>
> $GCC -O2  -fstack-protector-strong  vs
> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg
> -mstack-protector-guard-offset=1024 -fstack-protector-strong
>
>
> --- tst.s   2018-12-03 09:46:21.174167443 +
> +++ tst.s.1 2018-12-03 09:46:03.546257203 +
> @@ -15,15 +15,14 @@
> mov x29, sp
> str x19, [sp, 16]
> .cfi_offset 19, -128
> -   adrpx19, __stack_chk_guard
> -   add x19, x19, :lo12:__stack_chk_guard
> -   ldr x0, [x19]
> -   str x0, [sp, 136]
> -   mov x0,0
> +   mrs x19, sp_el0
> add x0, sp, 32
> +   ldr x1, [x19, 1024]
> +   str x1, [sp, 136]
> +   mov x1,0
> bl  bar
> ldr x0, [sp, 136]
> -   ldr x1, [x19]
> +   ldr x1, [x19, 1024]
> eor x1, x0, x1
> cbnzx1, .L5
>
>
>
>
> I will be afk tomorrow and day after but this is to elicit some comments
> and for Ard to try this out with his kernel patches.
>
> Thoughts ?
>
> regards
> Ramana
>
> gcc/ChangeLog:
>
> 2018-11-23  Ramana Radhakrishnan  
>
>  * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
>  * config/aarch64/aarch64.c (aarch64_override_options_internal):
> Handle
>  and put in error checks for stack protector guard options.
>  (aarch64_stack_protect_guard): New.
>  (TARGET_STACK_PROTECT_GUARD): Define.
>  * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
>  (reg_stack_protect_address): New.
>  (stack_protect_set): Adjust for SSP_GLOBAL.
>  (stack_protect_test): Likewise.
>  * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
>  (-mstack-protector-guard): Likewise.
>  (-mstack-protector-guard-offset): Likewise.
>  * doc/invoke.texi: Document new AArch64 options.

Any further thoughts or is it just Jakub's comments that I need to
address on this patch ? It looks like the kernel folks have queued
this for the next kernel release and given this is helping the kernel
with a security feature, can we move this forward ?

Ramana


Re: [Committed] XFAIL gfortran.dg/ieee/ieee_9.f90

2018-12-27 Thread Ramana Radhakrishnan
Still on holiday, but this maybe because long double is 64bit on arm32. Real128 
may end up being mapped to long double for fortran on armhf ?

Ramana

From: Sudakshina Das
Sent: Thursday, December 27, 2018 8:46:29 AM
To: s...@troutmask.apl.washington.edu; Janne Blomqvist
Cc: Fortran List; GCC Patches; nd; kyrylo.tkac...@foss.arm.com; Ramana 
Radhakrishnan; Richard Earnshaw
Subject: Re: [Committed] XFAIL gfortran.dg/ieee/ieee_9.f90

Hi

On 25/12/18 5:13 PM, Steve Kargl wrote:
> On Tue, Dec 25, 2018 at 09:51:03AM +0200, Janne Blomqvist wrote:
>> On Mon, Dec 24, 2018 at 9:42 PM Steve Kargl <
>> s...@troutmask.apl.washington.edu> wrote:
>>
>>> On Mon, Dec 24, 2018 at 09:29:50PM +0200, Janne Blomqvist wrote:
>>>> On Mon, Dec 24, 2018 at 8:05 PM Steve Kargl <
>>>> s...@troutmask.apl.washington.edu> wrote:
>>>>
>>>>> I've added the following patch to a recently committed testcase.
>>>>>
>>>>> Index: gcc/testsuite/gfortran.dg/ieee/ieee_9.f90
>>>>> ===
>>>>> --- gcc/testsuite/gfortran.dg/ieee/ieee_9.f90   (revision 267413)
>>>>> +++ gcc/testsuite/gfortran.dg/ieee/ieee_9.f90   (working copy)
>>>>> @@ -1,4 +1,4 @@
>>>>> -! { dg-do run }
>>>>> +! { dg-do run { xfail arm*-*-gnueabi arm*-*-gnueabihf } }
>>>>>   program foo
>>>>>  use ieee_arithmetic
>>>>>  use iso_fortran_env
>>>>>
>>>> The problem seems to be that GFortran says the real128 kind value is > 0
>>>> (i.e. that the target supports quad precision floating point (with
>>> software
>>>> emulation, presumably)), but then trying to use it fails.
>>>>
>>>> Would be nice if somebody who cares about arm-none-linux-gnueabihf could
>>>> help figure out the proper resolution instead of papering over it with
>>>> XFAIL.
>>>>
>>>> But I guess XFAIL is good enough until said somebody turns up.
>>>>
>>> Thanks for chasing down the details.  I have no access to arm*-*-*.
>>>
>>> It's a shame the real128 is defined, and arm*-*-* doesn't
>>> actually use it.  I certainly have no time or interest in
>>> fix this.
>>>
>> I think there are arm systems on the compile farm, but I haven't actually
>> checked myself, just going by the error messages Sudi Das reported.
>>
>> That being said, having slept over it, I actually think there is a problem
>> with the testcase, and not with arm*. So the errors in the testcase occurs
>> in code like
>>
>> if (real128 > 0) then
>>  p = int(ieee_scalb(real(x, real128), int(i, int8)))
>>  if (p /= 64) stop 3
>> end if
>>
>> So if real128 is negative, as it should be if the target doesn't support
>> quad precision float, the branch will never be taken, but the frontend will
>> still generate code for it (though it will later be optimized away as
>> unreachable), and that's where the error occurs. So the testcase would need
>> something like
>>
>> integer, parameter :: large_real = max (real64, real128)
>> ! ...
>> if (real128 > 0) then
>>  p = int(ieee_scalb(real(x, large_real), int(i, int8)))
>>  if (p /= 64) stop 3
>> end if
>>
>> If you concur, please consider a patch fixing the testcase and removing the
>> xfail pre-approved.
>>
> Indeed, you are probably correct that gfortran will generate
> intermediate code and then garbage collect it.  This then will
> give an error for real(..., real128) in the statement for p.
> If real128 /= 4, 8, 10, or 16.  I'll fix the testcase.
>
> Do you know if we can get gfortran to pre-define macros for cpp?
> That is, it would be nice if gfortran would recognize, say,
> HAVE_GFC_REAL_10 and HAVE_GFC_REAL_16 if the target supports those
> types.  Then the testcase could be copied to ieee_9.F90, and
> modified to
>
> #ifdef HAVE_REAL_16
>  p = int(ieee_scalb(real(x, 16), int(i, int8)))
>  if (p /= 64) stop 3
> #endif
>
Thanks for looking into this. Sorry I was on holiday for Christmas.
CC'ing Arm maintainers in case they have something to add.

Thanks

Sudi



Re: add taishanv110 pipeline scheduling

2018-12-14 Thread Ramana Radhakrishnan
Hi Wuyuan,

On 06/12/2018 01:31, wuyuan (E) wrote:
> Hi ARM maintainers:
>  The taishanv110 core uses generic pipeline scheduling, which 
> restricted the performance of taishanv110 core. By adding the pipeline 
> scheduling of taishanv110 core in GCC,The performance of taishanv110 has been 
> improved.
>  The patch  as follows, please join.

Who is looking to fix the architectural version of the tsv110 like the 
LLVM description here https://reviews.llvm.org/D53908 ?

The GCC implementation considers this to be an armv8.4-A part while it 
really appears to be an armv8.2-A part with some optional extensions 
based on the link above ?

We are in the run up to the GCC 9 release so it would be good to get 
this fixed up before that.

regards
Ramana

> 
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> old mode 100644
> new mode 100755
> index c4ec556..d6cf1d3
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2018-12-05  wuyuan  
> +
> +   * config/aarch64/aarch64-cores.def: New CPU.
> +   * config/aarch64/aarch64.md : Add "tsv110.md"
> +   * gcc/config/aarch64/tsv110.md : pipeline description
> +
> 2018-11-26  David Malcolm  
> 
>* dump-context.h (dump_context::dump_loc): Convert 1st param from
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index 74be5db..8e84844 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -99,7 +99,7 @@ AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2 | AARCH64_F
> /* ARMv8.4-A Architecture Processors.  */
> 
> /* HiSilicon ('H') cores. */
> -AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
> +AARCH64_CORE("tsv110", tsv110,tsv110,8_4A, 
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
> 
> /* Qualcomm ('Q') cores. */
> AARCH64_CORE("saphira", saphira,saphira,8_4A,  
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> 0x51, 0xC01, -1)
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 82af4d4..5278d6b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -348,7 +348,7 @@
> (include "thunderx.md")
> (include "../arm/xgene1.md")
> (include "thunderx2t99.md")
> -
> +(include "tsv110.md")
> ;; ---
> ;; Jumps and other miscellaneous insns
> ;; ---
> diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
> new file mode 100644
> index 000..e912447
> --- /dev/null
> +++ b/gcc/config/aarch64/tsv110.md
> @@ -0,0 +1,708 @@
> +;; tsv110 pipeline description
> +;; Copyright (C) 2014-2016 Free Software Foundation, Inc.
> +;;
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but
> +;; WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;; General Public License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +(define_automaton "tsv110")
> +
> +(define_attr "tsv110_neon_type"
> +  "neon_arith_acc, neon_arith_acc_q,
> +   neon_arith_basic, neon_arith_complex,
> +   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
> +   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
> +   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
> +   neon_shift_imm_complex,
> +   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
> +   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
> +   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
> +   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
> +   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
> +   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
> +   neon_bitops, neon_bitops_q, neon_from_gp,
> +   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
> +   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
> +   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
> +   unknown"
> +  (cond [
> + (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
> +  neon_reduc_add_acc_q")
> +   (const_string "neon_arith_acc")
> +

Re: [PATCH][AArch64][doc] Clarify -msve-vector-bits=128 behaviour

2018-12-13 Thread Ramana Radhakrishnan
On Thu, Dec 13, 2018 at 10:15 AM Richard Sandiford
 wrote:
>
> Thanks for doing this.
>
> "Kyrill Tkachov"  writes:
> > @@ -15716,16 +15716,19 @@ an effect when SVE is enabled.
> >
> >  GCC supports two forms of SVE code generation: ``vector-length
> >  agnostic'' output that works with any size of vector register and
> > -``vector-length specific'' output that only works when the vector
> > -registers are a particular size.  Replacing @var{bits} with
> > -@samp{scalable} selects vector-length agnostic output while
> > -replacing it with a number selects vector-length specific output.
> > -The possible lengths in the latter case are: 128, 256, 512, 1024
> > -and 2048.  @samp{scalable} is the default.
> > -
> > -At present, @samp{-msve-vector-bits=128} produces the same output
> > -as @samp{-msve-vector-bits=scalable}.
> > -
> > +``vector-length specific'' output that allows GCC to make assumptions
> > +about the vector length when it is useful for optimization reasons.
> > +The possible values of @samp{bits} are: @samp{scalable}, @samp{128},
> > +@samp{256}, @samp{512}, @samp{1024} and @samp{2048}.
> > +Specifying @samp{scalable} selects vector-length agnostic
> > +output.  At present @samp{-msve-vector-bits=128} also generates 
> > vector-length
> > +agnostic output.  All other values generate vector-length specific code.
> > +The behavior of these values may change in future releases and no value 
> > except
> > +@samp{scalable} should be relied on for producing code that is portable 
> > across
> > +different hardware SVE vector lengths.
> > +
> > +The default is @samp{-msve-vector-bits=scalable} which produces
> > +vector-length agnostic code.
>
> Think we should have a comma before "which" in the last sentence.
> OK with that change.

And backport to GCC-8 ?

ramana

>
> Richard


Re: [RFC][AArch64] Add support for system register based stack protector canary access

2018-12-07 Thread Ramana Radhakrishnan
On Tue, Dec 4, 2018 at 12:58 PM Florian Weimer  wrote:
>
> * Wilco Dijkstra:
>
> >> For userland, I would like to eventually copy the OpenBSD approach for
> >> architectures which have some form of PC-relative addressing: we can
> >> have multiple random canaries in (RELRO) .rodata in sufficiently close
> >> to the code that needs them (assuming that we have split .rodata).

On AArch64 as well we've split .rodata. I think I did this with GCC 5.

All the addressing of global data is through PC relative access and in
the small model which is the default in Linux userland, I think we'll
just be fine.

>  At
> >> least for x86-64, I expect this to be a small win.  It's also a slight
> >> hardening improvement if the reference canary is not stored in writable
> >> memory.
> >
> > On AArch64 hardware pointer signing already provides a free and more robust
> > implementation of stack canaries, so we could change -fstack-protector to
> > use that when pointer signing is enabled.
>
> I expected to use both because not all AArch64 implementations support
> pointer signing, and we'd use the stack protector to get some coverage
> for the legacy implementations.

Indeed. until the default goes up to Armv8.3-A it's going to be default to this.


regards
Ramana

>
> Thanks,
> Florian


Re: [RFC][AArch64] Add support for system register based stack protector canary access

2018-12-03 Thread Ramana Radhakrishnan
On 03/12/2018 09:59, Jakub Jelinek wrote:
> On Mon, Dec 03, 2018 at 09:55:36AM +0000, Ramana Radhakrishnan wrote:
>> +  if (aarch64_stack_protector_guard == SSP_GLOBAL
>> +  && opts->x_aarch64_stack_protector_guard_offset_str)
>> +{
>> +  error ("Incompatible options -mstack-protector-guard=global and"
> 
> Diagnostic messages shouldn't start with capital letters (3 times in the
> patch), unless it is something that is capitalized always, even in the
> middle of sentences (say GNU etc.).
> 
>   Jakub
> 
Sorry - will fix in next iteration.

Ramana


[RFC][AArch64] Add support for system register based stack protector canary access

2018-12-03 Thread Ramana Radhakrishnan
For quite sometime the kernel guys, (more specifically Ard) have been 
talking about using a system register (sp_el0) and an offset from that 
for a canary based access. This patchset adds support for a new set of
command line options similar to how powerpc has done this.

I don't intend to change the defaults in userland, we've discussed this 
for user-land in the past and as far as glibc and userland is concerned 
we stick to the options as currently existing. The system register 
option is really for the kernel to use along with an offset as they 
control their ABI and this is a decision for them to make.

I did consider sticking this all under a mcmodel=kernel-small option but
thought that would be a bit too aggressive. There is very little error
checking I can do in terms of the system register being used and really
the assembler would barf quite quickly in case things go wrong. I've
managed to rebuild Ard's kernel tree with an additional patch that
I will send to him. I haven't managed to boot this kernel.

There was an additional question asked about the performance 
characteristics of this but it's a security feature and the kernel 
doesn't have the luxury of a hidden symbol. Further since the kernel 
uses sp_el0 for access everywhere and if they choose to use the same 
register I don't think the performance characteristics would be too bad, 
but that's a decision for the kernel folks to make when taking in the 
feature into the kernel.

I still need to add some tests and documentation in invoke.texi but
this is at the stage where it would be nice for some other folks
to look at this.

The difference in code generated is as below.

extern void bar (char *);
int foo (void)
{
   char a[100];
   bar ();
}

$GCC -O2  -fstack-protector-strong  vs 
-mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg 
-mstack-protector-guard-offset=1024 -fstack-protector-strong


--- tst.s   2018-12-03 09:46:21.174167443 +
+++ tst.s.1 2018-12-03 09:46:03.546257203 +
@@ -15,15 +15,14 @@
mov x29, sp
str x19, [sp, 16]
.cfi_offset 19, -128
-   adrpx19, __stack_chk_guard
-   add x19, x19, :lo12:__stack_chk_guard
-   ldr x0, [x19]
-   str x0, [sp, 136]
-   mov x0,0
+   mrs x19, sp_el0
add x0, sp, 32
+   ldr x1, [x19, 1024]
+   str x1, [sp, 136]
+   mov x1,0
bl  bar
ldr x0, [sp, 136]
-   ldr x1, [x19]
+   ldr x1, [x19, 1024]
eor x1, x0, x1
cbnzx1, .L5




I will be afk tomorrow and day after but this is to elicit some comments 
and for Ard to try this out with his kernel patches.

Thoughts ?

regards
Ramana

gcc/ChangeLog:

2018-11-23  Ramana Radhakrishnan  

 * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
 * config/aarch64/aarch64.c (aarch64_override_options_internal): 
Handle
 and put in error checks for stack protector guard options.
 (aarch64_stack_protect_guard): New.
 (TARGET_STACK_PROTECT_GUARD): Define.
 * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
 (reg_stack_protect_address): New.
 (stack_protect_set): Adjust for SSP_GLOBAL.
 (stack_protect_test): Likewise.
 * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
 (-mstack-protector-guard): Likewise.
 (-mstack-protector-guard-offset): Likewise.
 * doc/invoke.texi: Document new AArch64 options.
commit 9febaa23c114e598ddc9a2406ad96d8fa3ebe0c6
Author: Ramana Radhakrishnan 
Date:   Mon Nov 19 10:12:12 2018 +


diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 7a5c6d7664f..2f06f3e0e5a 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -91,4 +91,10 @@ enum aarch64_sve_vector_bits_enum {
   SVE_2048 = 2048
 };
 
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard {
+  SSP_SYSREG,  /* per-thread canary in special system register 
*/
+  SSP_GLOBAL   /* global canary */
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0d89ba27e4a..a56b2166542 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10955,6 +10955,41 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   if (opts->x_flag_strict_volatile_bitfields < 0 && abi_version_at_least (2))
 opts->x_flag_strict_volatile_bitfields = 1;
 
+  if (aarch64_stack_protector_guard == SSP_GLOBAL
+  && opts->x_aarch64_stack_protector_guard_offset_str)
+{
+  error ("Incompatible options -mstack-protector-guard=global and"
+"-mstack-protector-guard-offset=%qs",
+aarch64_stack_protector_guard_offset_str);
+}
+
+  if (aarch64_stac

Re: [PATCH] PR libstdc++/67843 set shared_ptr lock policy at build-time

2018-11-30 Thread Ramana Radhakrishnan


Firstly thanks a lot for working on this.

On 28/11/2018 12:49, Jonathan Wakely wrote:
> On 28/11/18 12:30 +, Jonathan Wakely wrote:
>> 3) We could change libstdc++'s configure to automatically override
>> --with-libstdcxx-lock-policy for arm-linux so it always uses "atomic"
>> (because we know the kernel helpers are available). That causes an ABI
>> change for a GCC built for --target=armv5t-*-linux* so I didn't do
>> that by default. Packagers who want that can use the --with option
>> explicitly to build a libstdc++ with atomic refcounting that can be
>> used on any armv5t or later CPU, allowing users to choose a newer
>> -march for their own code. (Until my patch that didn't work, because
> 
> [...]
> 
>> Option 3) is not my call to make. If the ARM maintainers want to
>> change the default older arm-linux targets I have no objections.

We could change the defaults when(if) we next bump up the libstdc++ ABI 
in general :-/ and if we remember to do this. I don't feel comfortable 
changing the defaults silently and I don't view this as something we can 
decide on by ourselves as the Arm maintainers as this really falls in 
the space of folks distributing Linux using by defaults versions of the 
architecture that result in the use of mutexes rather than the atomics.

It's also not clear to me if we can use any other magic tricks to make 
this co-exist and whether that is worth it.
> 
> Another way to approach option 3) that we discussed at Cauldron, and
> which I want to start a separate discussion about on g...@gcc.gnu.org,
> is to change the value of __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 et al
> when we have kernel helpers to implement those operations.
> 
> The shared_ptr code doesn't care if an atomic CAS comes from the CPU
> or a kernel helper in libgcc. If the atomic CAS is supported via
> kernel helpers, let's use it! But those predefined macros only tell
> you about native CPU support (for the current target selected by
> -march).
> 
> 
> It's worth noting that all this shared_ptr code predates libatomic, so
> we couldn't just say "always use atomics, and link to libatomic if
> needed". If __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 was not defined, we had
> to assume no CAS *at all*. Not native, not provided by libatomic,
> nothing. I'd love to simply rely on std::atomic in std::shared_ptr,
> but that would be an ABI break for targets currently using a mutex,
> and would add new dependencies on libatomic for some targets. (It
> might also pessimize some single-threaded programs on targets that do
> use atomic ref-counts, because currently some updates are non-atomic
> when __gthread_active_p() == false).

Yep, though you want it to go to the kernel helpers or indeed libatomic.

regards
Ramana

> 
> 



Re: [PATCH, arm] Backport -- Fix ICE during thunk generation with -mlong-calls

2018-11-08 Thread Ramana Radhakrishnan
On 07/11/2018 17:49, Mihail Ionescu wrote:
> Hi All,
> 
> This is a backport from trunk for GCC 8 and 7.
> 
> SVN revision: r264595.
> 
> Regression tested on arm-none-eabi.
> 
> 
> gcc/ChangeLog
> 
> 2018-11-02  Mihail Ionescu  
> 
>   Backport from mainiline
>   2018-09-26  Eric Botcazou  
> 
>   * config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks.
>   (arm32_output_mi_thunk): Deal with long calls.
> 
> gcc/testsuite/ChangeLog
> 
> 2018-11-02  Mihail Ionescu  
> 
>   Backport from mainiline
>   2018-09-17  Eric Botcazou  
> 
>   * g++.dg/other/thunk2a.C: New test.
>   * g++.dg/other/thunk2b.C: Likewise.
> 
> 
> If everything is ok, could someone commit it on my behalf?
> 
> Best regards,
>  Mihail
> 

It is a regression since my rewrite of this code.

Ok to backport to the release branches, it's been on trunk for a while 
and not shown any issues - please give the release managers a day or so 
to object.

regards
Ramana


Re: [ARM] Implement division using vrecpe, vrecps

2018-11-05 Thread Ramana Radhakrishnan
On 26/10/2018 06:04, Prathamesh Kulkarni wrote:
> Hi,
> This is a rebased version of patch that adds a pattern to neon.md for
> implementing division with multiplication by reciprocal using
> vrecpe/vrecps with -funsafe-math-optimizations excluding -Os.
> The newly added test-cases are not vectorized on armeb target with
> -O2. I posted the analysis for that here:
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01765.html
> 
> Briefly, the difference between little and big-endian vectorizer is in
> arm_builtin_support_vector_misalignment() which calls
> default_builtin_support_vector_misalignment() for big-endian case, and
> that returns false because
> movmisalign_optab does not exist for V2SF mode. This isn't observed
> with -O3 because loop peeling for alignment gets enabled.
> 
> It seems that the test cases in patch appear unsupported on armeb,
> after r221677 thus this patch requires no changes to
> target-supports.exp to adjust for armeb (unlike last time which
> stalled the patch).
> 
> Bootstrap+tested on arm-linux-gnueabihf.
> Cross-tested on arm*-*-* variants.
> OK for trunk ?
> 
> Thanks,
> Prathamesh
> 
> 
> tcwg-319-3.txt
> 
> 2018-10-26  Prathamesh Kulkarni
> 
>   * config/arm/neon.md (div3): New pattern.
> 
> testsuite/
>   * gcc.target/arm/neon-vect-div-1.c: New test.
>   * gcc.target/arm/neon-vect-div-2.c: Likewise.
> 
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 5aeee4b08c1..25ed45d381a 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -620,6 +620,38 @@
>   (const_string "neon_mul_")))]
>   )
>   
> +/* Perform division using multiply-by-reciprocal.
> +   Reciprocal is calculated using Newton-Raphson method.
> +   Enabled with -funsafe-math-optimizations -freciprocal-math
> +   and disabled for -Os since it increases code size .  */ > +
> +(define_expand "div3"
> +  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
> +(div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
> +   (match_operand:VCVTF 2 "s_register_operand" "w")))]
> +  "TARGET_NEON && !optimize_size
> +   && flag_unsafe_math_optimizations && flag_reciprocal_math"

I would prefer this to be more granular than 
flag_unsafe_math_optimization && flag_reciprocal_math which really is 
flag_reciprocal_math as it is turned on by default with 
funsafe-math-optimizations.

I think this should really be just flag_reciprocal_math.


Otherwise ok.

regards
Ramana




> +  {
> +rtx rec = gen_reg_rtx (mode);
> +rtx vrecps_temp = gen_reg_rtx (mode);
> +
> +/* Reciprocal estimate.  */
> +emit_insn (gen_neon_vrecpe (rec, operands[2]));
> +
> +/* Perform 2 iterations of newton-raphson method.  */
> +for (int i = 0; i < 2; i++)
> +  {
> + emit_insn (gen_neon_vrecps (vrecps_temp, rec, operands[2]));
> + emit_insn (gen_mul3 (rec, rec, vrecps_temp));
> +  }
> +
> +/* We now have reciprocal in rec, perform operands[0] = operands[1] * 
> rec.  */
> +emit_insn (gen_mul3 (operands[0], operands[1], rec));
> +DONE;
> +  }
> +)
> +
> +
>   (define_insn "mul3add_neon"
> [(set (match_operand:VDQW 0 "s_register_operand" "=w")
>   (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" 
> "w")
> diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c 
> b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
> new file mode 100644
> index 000..50d04b4175b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-1.c
> @@ -0,0 +1,16 @@
> +/* Test pattern div3.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-require-effective-target vect_hw_misalign } */
> +/* { dg-options "-O2 -ftree-vectorize -funsafe-math-optimizations 
> -fdump-tree-vect-details" } */
> +/* { dg-add-options arm_neon } */
> +
> +void
> +foo (int len, float * __restrict p, float *__restrict x)
> +{
> +  len = len & ~31;
> +  for (int i = 0; i < len; i++)
> +p[i] = p[i] / x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c 
> b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
> new file mode 100644
> index 000..606f54b4e0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon-vect-div-2.c
> @@ -0,0 +1,16 @@
> +/* Test pattern div3.  */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-require-effective-target vect_hw_misalign } */
> +/* { dg-options "-O3 -ftree-vectorize -funsafe-math-optimizations 
> -fdump-tree-vect-details -fno-reciprocal-math" } */
> +/* { dg-add-options arm_neon } */
> +
> +void
> +foo (int len, float * __restrict p, float *__restrict x)
> +{
> +  len = len & ~31;
> +  for (int i = 0; i < len; i++)
> +p[i] = p[i] / x[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not "vectorized 1 loops" "vect" } } */
> 



Re: [PATCH, GCC/ARM] Fix PR87374: ICE with -mslow-flash-data and -mword-relocations

2018-10-30 Thread Ramana Radhakrishnan
On Fri, Oct 5, 2018 at 5:50 PM Thomas Preudhomme
 wrote:
>
> Hi Ramana and Kyrill,
>
> I've reworked the patch to add some documentation of the option
> conflict and reworked the -mword-relocation logic slightly to set the
> variable explicitely in PIC mode rather than test for PIC and word
> relocation everywhere.

Ok.

Thanks,
Ramana

>
> ChangeLog entries are now as follows:
>
> *** gcc/ChangeLog ***
>
> 2018-10-02  Thomas Preud'homme  
>
> PR target/87374
> * config/arm/arm.c (arm_option_check_internal): Disable the combined
> use of -mslow-flash-data and -mword-relocations.
> (arm_option_override): Enable -mword-relocations if -fpic or -fPIC.
> * config/arm/arm.md (SYMBOL_REF MOVT splitter): Stop checking for
> flag_pic.
> * doc/invoke.texi (-mword-relocations): Mention conflict with
> -mslow-flash-data.
> (-mslow-flash-data): Reciprocally.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-09-25  Thomas Preud'homme  
>
> PR target/87374
> * gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data and
> -mword-relocations would be passed when compiling the test.
> * gcc.target/arm/movsi_movt.c: Likewise.
> * gcc.target/arm/pr81863.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
> * gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
> * gcc.target/arm/tls-disable-literal-pool.c: Likewise.
>
> Is this ok for trunk?
>
> Best regards,
>
> Thomas
>
> On Tue, 2 Oct 2018 at 13:39, Ramana Radhakrishnan
>  wrote:
> >
> > On 02/10/2018 11:42, Thomas Preudhomme wrote:
> > > Hi Ramana,
> > >
> > > On Thu, 27 Sep 2018 at 11:14, Ramana Radhakrishnan
> > >  wrote:
> > >>
> > >> On 27/09/2018 09:26, Kyrill Tkachov wrote:
> > >>> Hi Thomas,
> > >>>
> > >>> On 26/09/18 18:39, Thomas Preudhomme wrote:
> > >>>> Hi,
> > >>>>
> > >>>> GCC ICEs under -mslow-flash-data and -mword-relocations because there
> > >>>> is no way to load an address, both literal pools and MOVW/MOVT being
> > >>>> forbidden. This patch gives an error message when both options are
> > >>>> specified by the user and adds the according dg-skip-if directives for
> > >>>> tests that use either of these options.
> > >>>>
> > >>>> ChangeLog entries are as follows:
> > >>>>
> > >>>> *** gcc/ChangeLog ***
> > >>>>
> > >>>> 2018-09-25  Thomas Preud'homme  
> > >>>>
> > >>>>PR target/87374
> > >>>>* config/arm/arm.c (arm_option_check_internal): Disable the 
> > >>>> combined
> > >>>>use of -mslow-flash-data and -mword-relocations.
> > >>>>
> > >>>> *** gcc/testsuite/ChangeLog ***
> > >>>>
> > >>>> 2018-09-25  Thomas Preud'homme  
> > >>>>
> > >>>>PR target/87374
> > >>>>* gcc.target/arm/movdi_movt.c: Skip if both -mslow-flash-data 
> > >>>> and
> > >>>>-mword-relocations would be passed when compiling the test.
> > >>>>* gcc.target/arm/movsi_movt.c: Likewise.
> > >>>>* gcc.target/arm/pr81863.c: Likewise.
> > >>>>* gcc.target/arm/thumb2-slow-flash-data-1.c: Likewise.
> > >>>>* gcc.target/arm/thumb2-slow-flash-data-2.c: Likewise.
> > >>>>* gcc.target/arm/thumb2-slow-flash-data-3.c: Likewise.
> > >>>>* gcc.target/arm/thumb2-slow-flash-data-4.c: Likewise.
> > >>>>* gcc.target/arm/thumb2-slow-flash-data-5.c: Likewise.
> > >>>>* gcc.target/arm/tls-disable-literal-pool.c: Likewise.
> > >>>>
> > >>>>
> > >>>> Testing: Bootstrapped in Thumb-2 mode. No testsuite regression when
> > >>>> targeting arm-none-eabi. Modified tests get skipped as expected when
> > >>>> running the testsuite with -mslow-flash-data (pr81863.c) or
> > >>>> -mword-relocations (all the others).
> > >>>>
> > >>>>
> > >>>> Is this ok for trunk? I'd also appreciate guidance on whether

  1   2   3   4   5   6   7   8   9   10   >