Hi,

Is this ok to backport to gcc-5-branch and gcc-6-branch? Patch applies cleanly (patches attached for reference).


2016-11-17  Thomas Preud'homme  <thomas.preudho...@arm.com>

    Backport from mainline
    2016-11-17  Thomas Preud'homme  <thomas.preudho...@arm.com>

    gcc/
    PR target/77933
    * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
    being live in the function and lr needing to be saved.  Distinguish
    between already saved pushable registers and registers to push.
    Check for LR being an available pushable register.

    gcc/testsuite/
    PR target/77933
    * gcc.target/arm/pr77933-1.c: New test.
    * gcc.target/arm/pr77933-2.c: Likewise.


Best regards,

Thomas


On 17/11/16 20:15, Thomas Preudhomme wrote:
Hi Kyrill,

I've committed the following updated patch where the test is restricted to Thumb
execution mode and skipping it if not possible since -mtpcs-leaf-frame is only
available in Thumb mode. I've considered the change obvious.

*** gcc/ChangeLog ***

2016-11-08  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR target/77933
        * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
        being live in the function and lr needing to be saved.  Distinguish
        between already saved pushable registers and registers to push.
        Check for LR being an available pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-08  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR target/77933
        * gcc.target/arm/pr77933-1.c: New test.
        * gcc.target/arm/pr77933-2.c: Likewise.

Best regards,

Thomas

On 17/11/16 10:04, Kyrill Tkachov wrote:

On 09/11/16 16:41, Thomas Preudhomme wrote:
I've reworked the patch following comments from Wilco [1] (sorry could not
find it in my MUA for some reason).

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html


== Context ==

When saving registers, function thumb1_expand_prologue () aims at minimizing
the number of push instructions. One of the optimization it does is to push LR
alongside high register(s) (after having moved them to low register(s)) when
there is no low register to save. The way this is implemented is to add LR to
the pushable_regs mask if it is live just before pushing the registers in that
mask. The mask of live pushable registers which is used to detect whether LR
needs to be saved is then clear to ensure LR is only saved once.


== Problem ==

However beyond deciding what register to push pushable_regs is used to track
what pushable register can be used to move a high register before being
pushed, hence the name. That mask is cleared when all high registers have been
assigned a low register but the clearing assumes the high registers were
assigned to the registers with the biggest number in that mask. This is not
the case because LR is not considered when looking for a register in that
mask. Furthermore, LR might have been saved in the TARGET_BACKTRACE path above
yet the mask of live pushable registers is not cleared in that case.


== Solution ==

This patch changes the loop to iterate over register LR to r0 so as to both
fix the stack corruption reported in PR77933 and reuse lr to push some high
register when possible. This patch also introduce a new variable
lr_needs_saving to record whether LR (still) needs to be saved at a given
point in code and sets the variable accordingly throughout the code, thus
fixing the second issue. Finally, this patch create a new push_mask variable
to distinguish between the mask of registers to push and the mask of live
pushable registers.


== Note ==

Other bits could have been improved but have been left out to allow the patch
to be backported to stable branch:

(1) using argument registers that are not holding an argument
(2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask
(low register push) and push_mask
(3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0
(4) rename l_mask to a more appropriate name (live_pushable_regs_mask?)

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-08  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR target/77933
        * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
        being live in the function and lr needing to be saved. Distinguish
        between already saved pushable registers and registers to push.
        Check for LR being an available pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-08  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR target/77933
        * gcc.target/arm/pr77933-1.c: New test.
        * gcc.target/arm/pr77933-2.c: Likewise.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?


Ok.
Thanks,
Kyrill

Best regards,

Thomas

On 02/11/16 17:08, Thomas Preudhomme wrote:
Hi,

When saving registers, function thumb1_expand_prologue () aims at minimizing
the
number of push instructions. One of the optimization it does is to push lr
alongside high register(s) (after having moved them to low register(s)) when
there is no low register to save. The way this is implemented is to add lr to
the list of registers that can be pushed just before the push happens. This
would then push lr and allows it to be used for further push if there was not
enough registers to push all high registers to be pushed.

However, the logic that decides what register to move high registers to before
being pushed only looks at low registers (see for loop initialization). This
means not only that lr is not used for pushing high registers but also that lr
is not removed from the list of registers to be pushed when it's not used. This
extra lr push is not poped in epilogue leading in stack corruption.

This patch changes the loop to iterate over register r0 to lr so as to both fix
the stack corruption and reuse lr to push some high register when possible.

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-01  Thomas Preud'homme <thomas.preudho...@arm.com>

        PR target/77933
        * config/arm/arm.c (thumb1_expand_prologue): Also check for lr being a
        pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-01  Thomas Preud'homme <thomas.preudho...@arm.com>

        PR target/77933
        * gcc.target/arm/pr77933.c: New test.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?

Best regards,

Thomas

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 				# pragmatic.
 				tmake_profile_file="arm/t-aprofile"
 				;;
+			rmprofile)
+				# Note that arm/t-rmprofile is a
+				# stand-alone make file fragment to be
+				# used only with itself.  We do not
+				# specifically use the
+				# TM_MULTILIB_OPTION framework because
+				# this shorthand is more
+				# pragmatic.
+				tmake_profile_file="arm/t-rmprofile"
+				;;
 			default)
 				;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-				# arm/t-aprofile is only designed to work
-				# without any with-cpu, with-arch, with-mode,
-				# with-fpu or with-float options.
+				# arm/t-aprofile and arm/t-rmprofile are only
+				# designed to work without any with-cpu,
+				# with-arch, with-mode, with-fpu or with-float
+				# options.
 				if test "x$with_arch" != x \
 				    || test "x$with_cpu" != x \
 				    || test "x$with_float" != x \
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
new file mode 100644
index 0000000000000000000000000000000000000000..c8b5c9cbd03694eea69855e20372afa3e97d6b4c
--- /dev/null
+++ b/gcc/config/arm/t-rmprofile
@@ -0,0 +1,174 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# This is a target makefile fragment that attempts to get
+# multilibs built for the range of CPU's, FPU's and ABI's that
+# are relevant for the ARM architecture.  It should not be used in
+# conjunction with another make file fragment and assumes --with-arch,
+# --with-cpu, --with-fpu, --with-float, --with-mode have their default
+# values during the configure step.  We enforce this during the
+# top-level configury.
+
+MULTILIB_OPTIONS     =
+MULTILIB_DIRNAMES    =
+MULTILIB_EXCEPTIONS  =
+MULTILIB_MATCHES     =
+MULTILIB_REUSE       =
+
+# We have the following hierachy:
+#   ISA: A32 (.) or T16/T32 (thumb).
+#   Architecture: ARMv6S-M (v6-m), ARMv7-M (v7-m), ARMv7E-M (v7e-m),
+#                 ARMv8-M Baseline (v8-m.base) or ARMv8-M Mainline (v8-m.main).
+#   FPU: VFPv3-D16 (fpv3), FPV4-SP-D16 (fpv4-sp), FPV5-SP-D16 (fpv5-sp),
+#        VFPv5-D16 (fpv5), or None (.).
+#   Float-abi: Soft (.), softfp (softfp), or hard (hardfp).
+
+# Options to build libraries with
+
+MULTILIB_OPTIONS       += mthumb
+MULTILIB_DIRNAMES      += thumb
+
+MULTILIB_OPTIONS       += march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7/march=armv8-m.base/march=armv8-m.main
+MULTILIB_DIRNAMES      += v6-m v7-m v7e-m v7-ar v8-m.base v8-m.main
+
+MULTILIB_OPTIONS       += mfpu=vfpv3-d16/mfpu=fpv4-sp-d16/mfpu=fpv5-sp-d16/mfpu=fpv5-d16
+MULTILIB_DIRNAMES      += fpv3 fpv4-sp fpv5-sp fpv5
+
+MULTILIB_OPTIONS       += mfloat-abi=softfp/mfloat-abi=hard
+MULTILIB_DIRNAMES      += softfp hard
+
+
+# Option combinations to build library with
+
+# Default CPU/Arch
+MULTILIB_REQUIRED      += mthumb
+MULTILIB_REQUIRED      += mfloat-abi=hard
+
+# ARMv6-M
+MULTILIB_REQUIRED      += mthumb/march=armv6s-m
+
+# ARMv8-M Baseline
+MULTILIB_REQUIRED      += mthumb/march=armv8-m.base
+
+# ARMv7-M
+MULTILIB_REQUIRED      += mthumb/march=armv7-m
+
+# ARMv7E-M
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv4-sp-d16/mfloat-abi=softfp
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv4-sp-d16/mfloat-abi=hard
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-d16/mfloat-abi=softfp
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-d16/mfloat-abi=hard
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-sp-d16/mfloat-abi=softfp
+MULTILIB_REQUIRED      += mthumb/march=armv7e-m/mfpu=fpv5-sp-d16/mfloat-abi=hard
+
+# ARMv8-M Mainline
+MULTILIB_REQUIRED      += mthumb/march=armv8-m.main
+MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-d16/mfloat-abi=softfp
+MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-d16/mfloat-abi=hard
+MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-sp-d16/mfloat-abi=softfp
+MULTILIB_REQUIRED      += mthumb/march=armv8-m.main/mfpu=fpv5-sp-d16/mfloat-abi=hard
+
+# ARMv7-R as well as ARMv7-A and ARMv8-A if aprofile was not specified
+MULTILIB_REQUIRED      += mthumb/march=armv7
+MULTILIB_REQUIRED      += mthumb/march=armv7/mfpu=vfpv3-d16/mfloat-abi=softfp
+MULTILIB_REQUIRED      += mthumb/march=armv7/mfpu=vfpv3-d16/mfloat-abi=hard
+
+
+# Matches
+
+# CPU Matches
+MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0
+MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0.small-multiply
+MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0plus
+MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m0plus.small-multiply
+MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m1
+MULTILIB_MATCHES       += march?armv6s-m=mcpu?cortex-m1.small-multiply
+MULTILIB_MATCHES       += march?armv7-m=mcpu?cortex-m3
+MULTILIB_MATCHES       += march?armv7e-m=mcpu?cortex-m4
+MULTILIB_MATCHES       += march?armv7e-m=mcpu?cortex-m7
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r4
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r4f
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r5
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r7
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-r8
+MULTILIB_MATCHES       += march?armv7=mcpu?marvell-pj4
+MULTILIB_MATCHES       += march?armv7=mcpu?generic-armv7-a
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a8
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a9
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a5
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a7
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a15
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a12
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a17
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a15.cortex-a7
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a17.cortex-a7
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a32
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a35
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a53
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a57
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a57.cortex-a53
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a72
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a72.cortex-a53
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a73
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a73.cortex-a35
+MULTILIB_MATCHES       += march?armv7=mcpu?cortex-a73.cortex-a53
+MULTILIB_MATCHES       += march?armv7=mcpu?exynos-m1
+MULTILIB_MATCHES       += march?armv7=mcpu?qdf24xx
+MULTILIB_MATCHES       += march?armv7=mcpu?xgene1
+
+# Arch Matches
+MULTILIB_MATCHES       += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES       += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES       += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES       += march?armv7=march?armv7-a
+MULTILIB_MATCHES       += march?armv7=march?armv7ve
+MULTILIB_MATCHES       += march?armv7=march?armv8-a
+MULTILIB_MATCHES       += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES       += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES       += march?armv7=march?armv8.1-a+crc
+MULTILIB_MATCHES       += march?armv7=march?armv8.2-a
+MULTILIB_MATCHES       += march?armv7=march?armv8.2-a+fp16
+endif
+
+# FPU matches
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3-fp16
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv3-d16-fp16
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?neon
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?neon-fp16
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv4
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?vfpv4-d16
+MULTILIB_MATCHES       += mfpu?vfpv3-d16=mfpu?neon-vfpv4
+MULTILIB_MATCHES       += mfpu?fpv5-d16=mfpu?fp-armv8
+MULTILIB_MATCHES       += mfpu?fpv5-d16=mfpu?neon-fp-armv8
+MULTILIB_MATCHES       += mfpu?fpv5-d16=mfpu?crypto-neon-fp-armv8
+endif
+
+
+# We map all requests for ARMv7-R or ARMv7-A in ARM mode to Thumb mode and
+# any FPU to VFPv3-d16 if possible.
+MULTILIB_REUSE         += mthumb/march.armv7=march.armv7
+MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp
+MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard
+MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp=march.armv7/mfpu.fpv5-d16/mfloat-abi.softfp
+MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard=march.armv7/mfpu.fpv5-d16/mfloat-abi.hard
+MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.softfp=mthumb/march.armv7/mfpu.fpv5-d16/mfloat-abi.softfp
+MULTILIB_REUSE         += mthumb/march.armv7/mfpu.vfpv3-d16/mfloat-abi.hard=mthumb/march.armv7/mfpu.fpv5-d16/mfloat-abi.hard
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index e4c686e60c7f479ca3ea71e94c4bb6ad52373085..0b94bc1931a226e58d06a7ed5a726454142c006a 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1107,19 +1107,59 @@ sysv, aix.
 
 @item --with-multilib-list=@var{list}
 @itemx --without-multilib-list
-Specify what multilibs to build.
-Currently only implemented for arm*-*-*, sh*-*-* and x86-64-*-linux*.
+Specify what multilibs to build.  @var{list} is a comma separated list of
+values, possibly consisting of a single value.  Currently only implemented
+for arm*-*-*, sh*-*-* and x86-64-*-linux*.  The accepted values and meaning
+for each target is given below.
 
 @table @code
 @item arm*-*-*
-@var{list} is either @code{default} or @code{aprofile}.  Specifying
-@code{default} is equivalent to omitting this option while specifying
-@code{aprofile} builds multilibs for each combination of ISA (@code{-marm} or
-@code{-mthumb}), architecture (@code{-march=armv7-a}, @code{-march=armv7ve},
-or @code{-march=armv8-a}), FPU available (none, @code{-mfpu=vfpv3-d16},
-@code{-mfpu=neon}, @code{-mfpu=vfpv4-d16}, @code{-mfpu=neon-vfpv4} or
-@code{-mfpu=neon-fp-armv8} depending on architecture) and floating-point ABI
-(@code{-mfloat-abi=softfp} or @code{-mfloat-abi=hard}).
+@var{list} is one of@code{default}, @code{aprofile} or @code{rmprofile}.
+Specifying @code{default} is equivalent to omitting this option, ie. only the
+default runtime library will be enabled.  Specifying @code{aprofile} or
+@code{rmprofile} builds multilibs for a combination of ISA, architecture,
+FPU available and floating-point ABI.
+
+The table below gives the combination of ISAs, architectures, FPUs and
+floating-point ABIs for which multilibs are built for each accepted value.
+
+@multitable @columnfractions .15 .28 .30
+@item Option @tab aprofile @tab rmprofile
+@item ISAs
+@tab @code{-marm} and @code{-mthumb}
+@tab @code{-mthumb}
+@item Architectures@*@*@*@*@*@*
+@tab default architecture@*
+@code{-march=armv7-a}@*
+@code{-march=armv7ve}@*
+@code{-march=armv8-a}@*@*@*
+@tab default architecture@*
+@code{-march=armv6s-m}@*
+@code{-march=armv7-m}@*
+@code{-march=armv7e-m}@*
+@code{-march=armv8-m.base}@*
+@code{-march=armv8-m.main}@*
+@code{-march=armv7}
+@item FPUs@*@*@*@*@*
+@tab none@*
+@code{-mfpu=vfpv3-d16}@*
+@code{-mfpu=neon}@*
+@code{-mfpu=vfpv4-d16}@*
+@code{-mfpu=neon-vfpv4}@*
+@code{-mfpu=neon-fp-armv8}
+@tab none@*
+@code{-mfpu=vfpv3-d16}@*
+@code{-mfpu=fpv4-sp-d16}@*
+@code{-mfpu=fpv5-sp-d16}@*
+@code{-mfpu=fpv5-d16}@*
+@item floating-point@/ ABIs@*@*
+@tab @code{-mfloat-abi=soft}@*
+@code{-mfloat-abi=softfp}@*
+@code{-mfloat-abi=hard}
+@tab @code{-mfloat-abi=soft}@*
+@code{-mfloat-abi=softfp}@*
+@code{-mfloat-abi=hard}
+@end multitable
 
 @item sh*-*-*
 @var{list} is a comma separated list of CPU names.  These must be of the
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c01a3c878968f6e6f07358b0686e4a59e34f56b7..5c975625bfa25d2c71c27db348cd3e70fe44a951 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24457,6 +24457,7 @@ thumb1_expand_prologue (void)
   unsigned long live_regs_mask;
   unsigned long l_mask;
   unsigned high_regs_pushed = 0;
+  bool lr_needs_saving;
 
   func_type = arm_current_func_type ();
 
@@ -24479,6 +24480,7 @@ thumb1_expand_prologue (void)
 
   offsets = arm_get_frame_offsets ();
   live_regs_mask = offsets->saved_regs_mask;
+  lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
 
   /* Extract a mask of the ones we can give to the Thumb's push instruction.  */
   l_mask = live_regs_mask & 0x40ff;
@@ -24545,6 +24547,7 @@ thumb1_expand_prologue (void)
 	{
 	  insn = thumb1_emit_multi_reg_push (l_mask, l_mask);
 	  RTX_FRAME_RELATED_P (insn) = 1;
+	  lr_needs_saving = false;
 
 	  offset = bit_count (l_mask) * UNITS_PER_WORD;
 	}
@@ -24609,12 +24612,13 @@ thumb1_expand_prologue (void)
      be a push of LR and we can combine it with the push of the first high
      register.  */
   else if ((l_mask & 0xff) != 0
-	   || (high_regs_pushed == 0 && l_mask))
+	   || (high_regs_pushed == 0 && lr_needs_saving))
     {
       unsigned long mask = l_mask;
       mask |= (1 << thumb1_extra_regs_pushed (offsets, true)) - 1;
       insn = thumb1_emit_multi_reg_push (mask, mask);
       RTX_FRAME_RELATED_P (insn) = 1;
+      lr_needs_saving = false;
     }
 
   if (high_regs_pushed)
@@ -24632,7 +24636,9 @@ thumb1_expand_prologue (void)
       /* Here we need to mask out registers used for passing arguments
 	 even if they can be pushed.  This is to avoid using them to stash the high
 	 registers.  Such kind of stash may clobber the use of arguments.  */
-      pushable_regs = l_mask & (~arg_regs_mask) & 0xff;
+      pushable_regs = l_mask & (~arg_regs_mask);
+      if (lr_needs_saving)
+	pushable_regs &= ~(1 << LR_REGNUM);
 
       if (pushable_regs == 0)
 	pushable_regs = 1 << thumb_find_work_register (live_regs_mask);
@@ -24640,8 +24646,9 @@ thumb1_expand_prologue (void)
       while (high_regs_pushed > 0)
 	{
 	  unsigned long real_regs_mask = 0;
+	  unsigned long push_mask = 0;
 
-	  for (regno = LAST_LO_REGNUM; regno >= 0; regno --)
+	  for (regno = LR_REGNUM; regno >= 0; regno --)
 	    {
 	      if (pushable_regs & (1 << regno))
 		{
@@ -24650,6 +24657,7 @@ thumb1_expand_prologue (void)
 
 		  high_regs_pushed --;
 		  real_regs_mask |= (1 << next_hi_reg);
+		  push_mask |= (1 << regno);
 
 		  if (high_regs_pushed)
 		    {
@@ -24659,23 +24667,20 @@ thumb1_expand_prologue (void)
 			  break;
 		    }
 		  else
-		    {
-		      pushable_regs &= ~((1 << regno) - 1);
-		      break;
-		    }
+		    break;
 		}
 	    }
 
 	  /* If we had to find a work register and we have not yet
 	     saved the LR then add it to the list of regs to push.  */
-	  if (l_mask == (1 << LR_REGNUM))
+	  if (lr_needs_saving)
 	    {
-	      pushable_regs |= l_mask;
-	      real_regs_mask |= l_mask;
-	      l_mask = 0;
+	      push_mask |= 1 << LR_REGNUM;
+	      real_regs_mask |= 1 << LR_REGNUM;
+	      lr_needs_saving = false;
 	    }
 
-	  insn = thumb1_emit_multi_reg_push (pushable_regs, real_regs_mask);
+	  insn = thumb1_emit_multi_reg_push (push_mask, real_regs_mask);
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
     }
diff --git a/gcc/testsuite/gcc.target/arm/pr77933-1.c b/gcc/testsuite/gcc.target/arm/pr77933-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..95cf68ea7531bcc453371f493a05bd40caa5541b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77933-1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_lr_and_highregs (void)
+{
+  __asm__ volatile ("" : : : "r8", "r9", "lr");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+
+  clobber_lr_and_highregs ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.target/arm/pr77933-2.c b/gcc/testsuite/gcc.target/arm/pr77933-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9028c4fcab4229591fa057f15c641d2b5597cd1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77933-2.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
+/* { dg-options "-mthumb -O2 -mtpcs-leaf-frame" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_lr_and_highregs (void)
+{
+  __asm__ volatile ("" : : : "r8", "r9", "lr");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+
+  clobber_lr_and_highregs ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 83cb13d1195beb19d6301f5c83a7eb544a91d877..1dba035c62c97a5f723d02208636c92108427379 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24710,6 +24710,7 @@ thumb1_expand_prologue (void)
   unsigned long live_regs_mask;
   unsigned long l_mask;
   unsigned high_regs_pushed = 0;
+  bool lr_needs_saving;
 
   func_type = arm_current_func_type ();
 
@@ -24732,6 +24733,7 @@ thumb1_expand_prologue (void)
 
   offsets = arm_get_frame_offsets ();
   live_regs_mask = offsets->saved_regs_mask;
+  lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
 
   /* Extract a mask of the ones we can give to the Thumb's push instruction.  */
   l_mask = live_regs_mask & 0x40ff;
@@ -24798,6 +24800,7 @@ thumb1_expand_prologue (void)
 	{
 	  insn = thumb1_emit_multi_reg_push (l_mask, l_mask);
 	  RTX_FRAME_RELATED_P (insn) = 1;
+	  lr_needs_saving = false;
 
 	  offset = bit_count (l_mask) * UNITS_PER_WORD;
 	}
@@ -24862,12 +24865,13 @@ thumb1_expand_prologue (void)
      be a push of LR and we can combine it with the push of the first high
      register.  */
   else if ((l_mask & 0xff) != 0
-	   || (high_regs_pushed == 0 && l_mask))
+	   || (high_regs_pushed == 0 && lr_needs_saving))
     {
       unsigned long mask = l_mask;
       mask |= (1 << thumb1_extra_regs_pushed (offsets, true)) - 1;
       insn = thumb1_emit_multi_reg_push (mask, mask);
       RTX_FRAME_RELATED_P (insn) = 1;
+      lr_needs_saving = false;
     }
 
   if (high_regs_pushed)
@@ -24885,7 +24889,9 @@ thumb1_expand_prologue (void)
       /* Here we need to mask out registers used for passing arguments
 	 even if they can be pushed.  This is to avoid using them to stash the high
 	 registers.  Such kind of stash may clobber the use of arguments.  */
-      pushable_regs = l_mask & (~arg_regs_mask) & 0xff;
+      pushable_regs = l_mask & (~arg_regs_mask);
+      if (lr_needs_saving)
+	pushable_regs &= ~(1 << LR_REGNUM);
 
       if (pushable_regs == 0)
 	pushable_regs = 1 << thumb_find_work_register (live_regs_mask);
@@ -24893,8 +24899,9 @@ thumb1_expand_prologue (void)
       while (high_regs_pushed > 0)
 	{
 	  unsigned long real_regs_mask = 0;
+	  unsigned long push_mask = 0;
 
-	  for (regno = LAST_LO_REGNUM; regno >= 0; regno --)
+	  for (regno = LR_REGNUM; regno >= 0; regno --)
 	    {
 	      if (pushable_regs & (1 << regno))
 		{
@@ -24903,6 +24910,7 @@ thumb1_expand_prologue (void)
 
 		  high_regs_pushed --;
 		  real_regs_mask |= (1 << next_hi_reg);
+		  push_mask |= (1 << regno);
 
 		  if (high_regs_pushed)
 		    {
@@ -24912,23 +24920,20 @@ thumb1_expand_prologue (void)
 			  break;
 		    }
 		  else
-		    {
-		      pushable_regs &= ~((1 << regno) - 1);
-		      break;
-		    }
+		    break;
 		}
 	    }
 
 	  /* If we had to find a work register and we have not yet
 	     saved the LR then add it to the list of regs to push.  */
-	  if (l_mask == (1 << LR_REGNUM))
+	  if (lr_needs_saving)
 	    {
-	      pushable_regs |= l_mask;
-	      real_regs_mask |= l_mask;
-	      l_mask = 0;
+	      push_mask |= 1 << LR_REGNUM;
+	      real_regs_mask |= 1 << LR_REGNUM;
+	      lr_needs_saving = false;
 	    }
 
-	  insn = thumb1_emit_multi_reg_push (pushable_regs, real_regs_mask);
+	  insn = thumb1_emit_multi_reg_push (push_mask, real_regs_mask);
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
     }
diff --git a/gcc/testsuite/gcc.target/arm/pr77933-1.c b/gcc/testsuite/gcc.target/arm/pr77933-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..95cf68ea7531bcc453371f493a05bd40caa5541b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77933-1.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_lr_and_highregs (void)
+{
+  __asm__ volatile ("" : : : "r8", "r9", "lr");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+
+  clobber_lr_and_highregs ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.target/arm/pr77933-2.c b/gcc/testsuite/gcc.target/arm/pr77933-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9028c4fcab4229591fa057f15c641d2b5597cd1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77933-2.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
+/* { dg-options "-mthumb -O2 -mtpcs-leaf-frame" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_lr_and_highregs (void)
+{
+  __asm__ volatile ("" : : : "r8", "r9", "lr");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+
+  clobber_lr_and_highregs ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}

Reply via email to