Hello everyone,
I've created the patch which sets the frame pointer to the predictable 
location in the stack frame for arm with THUMB2 and non-leaf functions.

Issue:
At this moment GCC emits frame layout for arm with THUMB2 mode,
"-fno-omit-frame-pointer" and AAPCS, and does not set frame pointer to
predictable location at the function frame for non-leaf functions. And
this is right thing to do regarding to answer on my
question https://gcc.gnu.org/ml/gcc-help/2018-07/msg00093.html.
But we still have an issue with performance, when we are using default
unwinder, which uses unwind tables. It could be up to 10 times faster to 
use frame based stack unwinder instead "default unwinder". As we know, 
the frame based stack unwinder works good for arm with ARM32 mode 
("-marm" compile-time option), because fp points to the lr on the stack, 
but the binary size could be about 30%-40% larger than with THUMB2 mode 
("-mthumb" compile time option). So, I think it
could be good to have an ability to use frame based stack unwinder for
thumb mode too. In this case AddressSanitizer, which uses frame based
stack unwinder, by default for every allocation/deallocation, could
benefit too.

Unresolved issue related to AddressSanitizer/LeakSanitzer:
https://github.com/google/sanitizers/issues/640

Problems:
By default arm with THUMB2 mode uses r7 register as frame pointer
register and it could be hard to set frame pointer to the predictable
location in the frame for some functions and some level of optimization.
For example interceptor for malloc in libasan built with
("-mthumb -fno-omit-frame-pointer -O2"):

000aa118 <__interceptor_malloc>:
    stmdb   sp!, {r4, r5, r6, r7, r8, r9, r10, r11, lr}
    subw    sp, sp, #1076
    add     r7, sp, #16
    ...
    function body
    ...
    addw    r7, r7, #1060
    mov     sp, r7
    ldmia.w sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc}

r7 points to sp + 16 after allocating 1076 bytes on the stack and
it's impossible to find the lr at the runtime.

In other way registers should be pushed in ascending order and we cannot 
change function prologue to this:

   push {r4, r5, r6, r8, r9, sl, r11, r7, lr }

And seems to me, the right solution is to make multiple push and 
multiple pop in the same order (clang works that way):

   push {r4, r5, r6, r7, lr}
   push {r8, r9, sl, r11 }
   add r7, sp, #32
   subw sp, sp #1076
   ...
   function body
   ...
   subw r7, #32
   mov sp, r7
   pop {r8, r9, sl, r11 }
   pop {r4, r5, r6, r7, pc }

So the r7 points to lr on the stack after function prologue and we can
find previous frame pointer by

subw r7, r7, #4
ldr r7, [r7]

at the runtime.

Verification steps:
1. I've build full Tizen OS distribution which consists of about ~1000
packages with that patch "-mthumb -fno-omit-frame-pointer -mthumb-fp 
-O2" and AddressSanitizer.
2. I've built the image from those packages and verified it on real
device with armv7 ISA, and it seems works. Also stack based frame
unwinder works.
3. It's hard to write correct test case, because gcc pushes {r8... r11}
registers only on some level of optimization and the source file should
have some amount of function. So, it's kinda hard to find the small test 
case. I have many preprocessed files with size about ~1kk bytes.
4. The frame layout for malloc interceptor in libasan which was built
with that patch and ("-fno-omit-frame-poiniter -O2 -mthumb -mthumb-fp")
looks like this:

000bb1f4 <__interceptor_malloc>:
   push    {r4, r5, r6, r7, lr}
   ldr     r6, [pc, #340]  ; (bb34c <__interceptor_malloc+0x158>)
   ldr     r3, [pc, #340]  ; (bb350 <__interceptor_malloc+0x15c>)
   add     r6, pc
   stmdb   sp!, {r8, r9, sl, fp}
   add     r7, sp, #32
   subw    sp, sp, #1076   ; 0x434
   ...
   function body
   ...
   subs    r7, #32
   mov     sp, r7
   ldmia.w sp!, {r8, r9, sl, fp}
   pop     {r4, r5, r6, r7, pc}

5. I've faced only one build fail related to this patch - glibc package,
but this is related to option name, some build machinery does not
recognize the "-mthumb-fp" option, and I'm not sure that glibc could be
build with frame layout like this.

It would be nice to get review for this patch by some experts. I could
miss a lot of corner cases.
Thanks.
From: Denis Khalikov <d.khali...@partner.samsung.com>
Date: Thu, 23 Aug 2018 16:33:58 +0300
Subject: [PATCH] Frame pointer for arm with THUMB2 mode.

Set frame pointer to the predictable location in the stack frame
for arm with THUMB2 mode.

2018-08-23  Denis Khalikov  <d.khali...@partner.samsung.com>

  * config/arm/arm.c (arm_emit_multi_reg_pop_no_return): New function.
  (arm_compute_initial_elimination_offset): Add support for
  TARGET_THUMB_STACK_UNWIND.
  (arm_expand_prologue): Emit function prologue related to
  TARGET_THUMB_STACK_UNWIND.
  (thumb2_expand_return): Emit function epilogue related to
  TARGET_THUMB_STACK_UNWIND.
  (arm_expand_epilogue): Emit function epilogue related to
  TARGET_THUMB_STACK_UNWIND.
  * config/arm/arm.h (TARGET_THUMB_STACK_UNWIND): New define.
  (INITIAL_ELIMINATION_OFFSET): Add support for
  TARGET_THUMB_STACK_UNWIND.
  * config/arm/arm.opt: Add compile-time option THUMB_FP.
---
 gcc/config/arm/arm.c   | 206 +++++++++++++++++++++++++++++++++++++++++++------
 gcc/config/arm/arm.h   |  10 ++-
 gcc/config/arm/arm.opt |   4 +
 3 files changed, 191 insertions(+), 29 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c081216d..344b715 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -20583,6 +20583,70 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
 				 stack_pointer_rtx, stack_pointer_rtx);
 }
 
+/* This function hepls to handle situation when we need to make
+   multiple pop, but does not have lr or pc registers inside
+   saved_regs_mask and can not add cfa_notes.
+   This function is based on arm_emit_multi_reg_pop.  */
+static void
+arm_emit_multi_reg_pop_no_return (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg;
+  int emit_update = 1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* If SP is in reglist, then we don't emit SP update insn.  */
+  emit_update = (saved_regs_mask & (1 << SP_REGNUM)) ? 0 : 1;
+
+  /* The parallel needs to hold num_regs SETs
+     and one SET for the stack update.  */
+  par = gen_rtx_PARALLEL (VOIDmode,
+			  rtvec_alloc (num_regs + emit_update));
+  if (emit_update)
+    {
+      /* Increment the stack pointer, based on there being
+	 num_regs 4-byte registers to restore.  */
+      tmp
+	= gen_rtx_SET (stack_pointer_rtx,
+		       plus_constant (Pmode, stack_pointer_rtx, 4 * num_regs));
+      RTX_FRAME_RELATED_P (tmp) = 1;
+      XVECEXP (par, 0, 0) = tmp;
+    }
+
+  for (j = 0, i = 0; j < num_regs; i++)
+    if (saved_regs_mask & (1 << i))
+      {
+	reg = gen_rtx_REG (SImode, i);
+	if ((num_regs == 1) && emit_update)
+	  {
+	    /* Emit single load with writeback.  */
+	    tmp = gen_frame_mem (SImode,
+				 gen_rtx_POST_INC (Pmode, stack_pointer_rtx));
+	    tmp = emit_insn (gen_rtx_SET (reg, tmp));
+	    REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+	    return;
+	  }
+
+	tmp = gen_rtx_SET (
+	  reg, gen_frame_mem (SImode,
+			      plus_constant (Pmode, stack_pointer_rtx, 4 * j)));
+	RTX_FRAME_RELATED_P (tmp) = 1;
+	XVECEXP (par, 0, j + emit_update) = tmp;
+	dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+	j++;
+      }
+
+  par = emit_insn (par);
+}
+
 /* Generate and emit an insn pattern that we will recognize as a pop_multi
    of NUM_REGS consecutive VFP regs, starting at FIRST_REG.
 
@@ -21233,6 +21297,10 @@ arm_compute_initial_elimination_offset (unsigned int from, unsigned int to)
       switch (to)
 	{
 	case THUMB_HARD_FRAME_POINTER_REGNUM:
+	  if (TARGET_THUMB_STACK_UNWIND)
+	    /* In this case the hard frame pointer points to the lr in the stack
+	       frame. So offset is frame - saved_args.  */
+	    return offsets->frame - offsets->saved_args;
 	  return 0;
 
 	case FRAME_POINTER_REGNUM:
@@ -21259,6 +21327,11 @@ arm_compute_initial_elimination_offset (unsigned int from, unsigned int to)
       switch (to)
 	{
 	case THUMB_HARD_FRAME_POINTER_REGNUM:
+	  if (TARGET_THUMB_STACK_UNWIND)
+	    /*  The hard frame_pointer points to the top entry in the
+		stack frame. The soft frame pointer points to the bottom
+		entry in the stack.  */
+	    return offsets->frame - offsets->soft_frame;
 	  return 0;
 
 	case ARM_HARD_FRAME_POINTER_REGNUM:
@@ -21832,7 +21905,7 @@ arm_expand_prologue (void)
   if ((func_type == ARM_FT_ISR || func_type == ARM_FT_FIQ)
       && (live_regs_mask & (1 << LR_REGNUM)) != 0
       && !(frame_pointer_needed && TARGET_APCS_FRAME)
-      && TARGET_ARM)
+      && (TARGET_ARM || TARGET_THUMB_STACK_UNWIND))
     {
       rtx lr = gen_rtx_REG (SImode, LR_REGNUM);
 
@@ -21879,20 +21952,57 @@ arm_expand_prologue (void)
           else
             {
 	      insn = emit_multi_reg_push (live_regs_mask, live_regs_mask);
-              RTX_FRAME_RELATED_P (insn) = 1;
-            }
-        }
+	       RTX_FRAME_RELATED_P (insn) = 1;
+	     }
+	 }
       else
-        {
-	  insn = emit_multi_reg_push (live_regs_mask, dwarf_regs_mask);
-          RTX_FRAME_RELATED_P (insn) = 1;
-        }
+	{
+	  /* In case we want to have frame pointer register to follow
+	     lr register on the stack, we need to modify stack layout.
+	     For example we have to save on the stack following registers
+	     {r4, ... r11, lr}, but with thumb mode frame pointer
+	     register is r7, so this layout is not correct. We will modify it
+	     from
+	     push {r4, ... r11, lr}
+	     to
+	     push {r4, ... r7, lr}
+	     push {r8, ..., r11}
+	     Registers should be pushed in ascending order.  */
+	  if (TARGET_THUMB_STACK_UNWIND && (live_regs_mask & (1 << 8))
+	      /* Check for non-leaf fucntion.  */
+	      && (live_regs_mask & (1 << LR_REGNUM)))
+	    {
+	      unsigned long first_regs_mask, dwarf_first_regs_mask,
+		second_regs_mask, dwarf_second_regs_mask;
+
+	      /* Save info about first 8 registers from r0 to r7 and
+		 add lr for this mask.  */
+	      first_regs_mask = dwarf_first_regs_mask
+		= (live_regs_mask & 0xff) | (1 << LR_REGNUM);
+	      /* Add all left registers from r8 and delete lr.  */
+	      second_regs_mask = dwarf_second_regs_mask
+		= (live_regs_mask & ~0xff) & ~(1 << LR_REGNUM);
+	      insn
+		= emit_multi_reg_push (first_regs_mask, dwarf_first_regs_mask);
+	      RTX_FRAME_RELATED_P (insn) = 1;
+	      insn = emit_multi_reg_push (second_regs_mask,
+					  dwarf_second_regs_mask);
+	      RTX_FRAME_RELATED_P (insn) = 1;
+	    }
+	  else
+	    {
+	      insn = emit_multi_reg_push (live_regs_mask, dwarf_regs_mask);
+	      RTX_FRAME_RELATED_P (insn) = 1;
+	    }
+	}
     }
 
   if (! IS_VOLATILE (func_type))
     saved_regs += arm_save_coproc_regs ();
 
-  if (frame_pointer_needed && TARGET_ARM)
+  /* Set frame pointer register (r7) to lr on the stack
+     for TARGET_THUMB_STACK_UNWIND.  */
+  if (frame_pointer_needed && (TARGET_ARM || TARGET_THUMB_STACK_UNWIND))
     {
       /* Create the new frame pointer.  */
       if (TARGET_APCS_FRAME)
@@ -21983,7 +22093,7 @@ arm_expand_prologue (void)
     }
 
 
-  if (frame_pointer_needed && TARGET_THUMB2)
+  if (frame_pointer_needed && (TARGET_THUMB2 && !TARGET_THUMB_FP))
     thumb_set_frame_pointer (offsets);
 
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
@@ -25441,11 +25551,26 @@ thumb2_expand_return (bool simple_return)
           emit_jump_insn (par);
         }
       else
-        {
-          saved_regs_mask &= ~ (1 << LR_REGNUM);
-          saved_regs_mask |=   (1 << PC_REGNUM);
-          arm_emit_multi_reg_pop (saved_regs_mask);
-        }
+	{
+	  /* We have made multiple push for thumb, when frame pointer
+	     is needed. So, make multiple pop in the same order.  */
+	  if (TARGET_THUMB_STACK_UNWIND && (saved_regs_mask & (1 << 8)))
+	    {
+	      unsigned long first_regs_mask = saved_regs_mask & 0xff;
+	      unsigned long second_regs_mask = saved_regs_mask & ~0xff;
+
+	      first_regs_mask |= (1 << PC_REGNUM);
+	      second_regs_mask &= ~((1 << PC_REGNUM) | (1 << LR_REGNUM));
+	      arm_emit_multi_reg_pop_no_return (second_regs_mask);
+	      arm_emit_multi_reg_pop (first_regs_mask);
+	    }
+	  else
+	    {
+	      saved_regs_mask &= ~(1 << LR_REGNUM);
+	      saved_regs_mask |= (1 << PC_REGNUM);
+	      arm_emit_multi_reg_pop (saved_regs_mask);
+	    }
+	}
     }
   else
     {
@@ -25724,7 +25849,7 @@ arm_expand_epilogue (bool really_return)
       rtx_insn *insn;
       /* Restore stack pointer if necessary.  */
       if (TARGET_ARM)
-        {
+	{
           /* In ARM mode, frame pointer points to first saved register.
              Restore stack pointer to last saved register.  */
           amount = offsets->frame - offsets->saved_regs;
@@ -25745,9 +25870,15 @@ arm_expand_epilogue (bool really_return)
         }
       else
         {
-          /* In Thumb-2 mode, the frame pointer points to the last saved
-             register.  */
-	  amount = offsets->locals_base - offsets->saved_regs;
+	  /* For TARGET_THUMB_STACK_UNWIND the frame pointer points to the
+	     bottom of the stack.  */
+	  if (TARGET_THUMB_STACK_UNWIND)
+	    amount = offsets->frame - offsets->saved_regs;
+	  else
+	    /* In Thumb-2 mode, the frame pointer points to the last saved
+	        register.  */
+	    amount = offsets->locals_base - offsets->saved_regs;
+
 	  if (amount)
 	    {
 	      insn = emit_insn (gen_addsi3 (hard_frame_pointer_rtx,
@@ -25896,10 +26027,10 @@ arm_expand_epilogue (bool really_return)
         }
       else
         {
-          if (TARGET_LDRD
+	   if (TARGET_LDRD
 	      && current_tune->prefer_ldrd_strd
-              && !optimize_function_for_size_p (cfun))
-            {
+	       && !optimize_function_for_size_p (cfun))
+	     {
               if (TARGET_THUMB2)
                 thumb2_emit_ldrd_pop (saved_regs_mask);
               else if (TARGET_ARM && !IS_INTERRUPT (func_type))
@@ -25907,9 +26038,34 @@ arm_expand_epilogue (bool really_return)
               else
                 arm_emit_multi_reg_pop (saved_regs_mask);
             }
-          else
-            arm_emit_multi_reg_pop (saved_regs_mask);
-        }
+	  else
+	    {
+	      /* Make multiple pop in the same order as we done multiple push.
+	       */
+	      if (TARGET_THUMB_STACK_UNWIND && (saved_regs_mask & (1 << 8))
+		  && ((saved_regs_mask & (1 << LR_REGNUM))
+		      || (saved_regs_mask & (1 << PC_REGNUM))))
+		{
+		  unsigned long first_regs_mask = saved_regs_mask & 0xff;
+		  unsigned long second_regs_mask = saved_regs_mask & ~0xff;
+
+		  if (saved_regs_mask & (1 << PC_REGNUM))
+		    {
+		      first_regs_mask |= (1 << PC_REGNUM);
+		      second_regs_mask &= ~(1 << PC_REGNUM);
+		    }
+		  else
+		    {
+		      first_regs_mask |= (1 << LR_REGNUM);
+		      second_regs_mask &= ~(1 << LR_REGNUM);
+		    }
+		  arm_emit_multi_reg_pop_no_return (second_regs_mask);
+		  arm_emit_multi_reg_pop (first_regs_mask);
+		}
+	      else
+		arm_emit_multi_reg_pop (saved_regs_mask);
+	    }
+	}
 
       if (return_in_pc)
         return;
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 34894c0..3e87064 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -117,6 +117,8 @@ extern tree arm_fp16_type_node;
 #define TARGET_THUMB1_P(flags) (TARGET_THUMB_P (flags) && !arm_arch_thumb2)
 #define TARGET_THUMB2_P(flags) (TARGET_THUMB_P (flags) && arm_arch_thumb2)
 #define TARGET_32BIT_P(flags)  (TARGET_ARM_P (flags) || TARGET_THUMB2_P (flags))
+#define TARGET_THUMB_STACK_UNWIND                                              \
+  (TARGET_THUMB2 && TARGET_THUMB_FP && frame_pointer_needed && arm_arch7)
 
 /* Run-time Target Specification.  */
 /* Use hardware floating point instructions. */
@@ -1564,10 +1566,10 @@ typedef struct
 
 /* Define the offset between two registers, one to be eliminated, and the
    other its replacement, at the start of a routine.  */
-#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET)			\
-  if (TARGET_ARM)							\
-    (OFFSET) = arm_compute_initial_elimination_offset (FROM, TO);	\
-  else									\
+#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET)                           \
+  if (TARGET_ARM || TARGET_THUMB_STACK_UNWIND)                                 \
+    (OFFSET) = arm_compute_initial_elimination_offset (FROM, TO);              \
+  else                                                                         \
     (OFFSET) = thumb_compute_initial_elimination_offset (FROM, TO)
 
 /* Special case handling of the location of arguments passed on the stack.  */
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index a1286a4..49e217f 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -198,6 +198,10 @@ mthumb
 Target Report RejectNegative Negative(marm) Mask(THUMB) Save
 Generate code for Thumb state.
 
+mthumb-fp
+Target Report Mask(THUMB_FP)
+Generate frame pointer which points to the lr register on the stack.
+
 mthumb-interwork
 Target Report Mask(INTERWORK)
 Support calls between Thumb and ARM instruction sets.
-- 
1.9.1

Reply via email to