Hello everyone, I've created the patch which sets the frame pointer to the predictable location in the stack frame for arm with THUMB2 and non-leaf functions.
Issue: At this moment GCC emits frame layout for arm with THUMB2 mode, "-fno-omit-frame-pointer" and AAPCS, and does not set frame pointer to predictable location at the function frame for non-leaf functions. And this is right thing to do regarding to answer on my question https://gcc.gnu.org/ml/gcc-help/2018-07/msg00093.html. But we still have an issue with performance, when we are using default unwinder, which uses unwind tables. It could be up to 10 times faster to use frame based stack unwinder instead "default unwinder". As we know, the frame based stack unwinder works good for arm with ARM32 mode ("-marm" compile-time option), because fp points to the lr on the stack, but the binary size could be about 30%-40% larger than with THUMB2 mode ("-mthumb" compile time option). So, I think it could be good to have an ability to use frame based stack unwinder for thumb mode too. In this case AddressSanitizer, which uses frame based stack unwinder, by default for every allocation/deallocation, could benefit too. Unresolved issue related to AddressSanitizer/LeakSanitzer: https://github.com/google/sanitizers/issues/640 Problems: By default arm with THUMB2 mode uses r7 register as frame pointer register and it could be hard to set frame pointer to the predictable location in the frame for some functions and some level of optimization. For example interceptor for malloc in libasan built with ("-mthumb -fno-omit-frame-pointer -O2"): 000aa118 <__interceptor_malloc>: stmdb sp!, {r4, r5, r6, r7, r8, r9, r10, r11, lr} subw sp, sp, #1076 add r7, sp, #16 ... function body ... addw r7, r7, #1060 mov sp, r7 ldmia.w sp!, {r4, r5, r6, r7, r8, r9, r10, r11, pc} r7 points to sp + 16 after allocating 1076 bytes on the stack and it's impossible to find the lr at the runtime. In other way registers should be pushed in ascending order and we cannot change function prologue to this: push {r4, r5, r6, r8, r9, sl, r11, r7, lr } And seems to me, the right solution is to make multiple push and multiple pop in the same order (clang works that way): push {r4, r5, r6, r7, lr} push {r8, r9, sl, r11 } add r7, sp, #32 subw sp, sp #1076 ... function body ... subw r7, #32 mov sp, r7 pop {r8, r9, sl, r11 } pop {r4, r5, r6, r7, pc } So the r7 points to lr on the stack after function prologue and we can find previous frame pointer by subw r7, r7, #4 ldr r7, [r7] at the runtime. Verification steps: 1. I've build full Tizen OS distribution which consists of about ~1000 packages with that patch "-mthumb -fno-omit-frame-pointer -mthumb-fp -O2" and AddressSanitizer. 2. I've built the image from those packages and verified it on real device with armv7 ISA, and it seems works. Also stack based frame unwinder works. 3. It's hard to write correct test case, because gcc pushes {r8... r11} registers only on some level of optimization and the source file should have some amount of function. So, it's kinda hard to find the small test case. I have many preprocessed files with size about ~1kk bytes. 4. The frame layout for malloc interceptor in libasan which was built with that patch and ("-fno-omit-frame-poiniter -O2 -mthumb -mthumb-fp") looks like this: 000bb1f4 <__interceptor_malloc>: push {r4, r5, r6, r7, lr} ldr r6, [pc, #340] ; (bb34c <__interceptor_malloc+0x158>) ldr r3, [pc, #340] ; (bb350 <__interceptor_malloc+0x15c>) add r6, pc stmdb sp!, {r8, r9, sl, fp} add r7, sp, #32 subw sp, sp, #1076 ; 0x434 ... function body ... subs r7, #32 mov sp, r7 ldmia.w sp!, {r8, r9, sl, fp} pop {r4, r5, r6, r7, pc} 5. I've faced only one build fail related to this patch - glibc package, but this is related to option name, some build machinery does not recognize the "-mthumb-fp" option, and I'm not sure that glibc could be build with frame layout like this. It would be nice to get review for this patch by some experts. I could miss a lot of corner cases. Thanks.
From: Denis Khalikov <d.khali...@partner.samsung.com> Date: Thu, 23 Aug 2018 16:33:58 +0300 Subject: [PATCH] Frame pointer for arm with THUMB2 mode. Set frame pointer to the predictable location in the stack frame for arm with THUMB2 mode. 2018-08-23 Denis Khalikov <d.khali...@partner.samsung.com> * config/arm/arm.c (arm_emit_multi_reg_pop_no_return): New function. (arm_compute_initial_elimination_offset): Add support for TARGET_THUMB_STACK_UNWIND. (arm_expand_prologue): Emit function prologue related to TARGET_THUMB_STACK_UNWIND. (thumb2_expand_return): Emit function epilogue related to TARGET_THUMB_STACK_UNWIND. (arm_expand_epilogue): Emit function epilogue related to TARGET_THUMB_STACK_UNWIND. * config/arm/arm.h (TARGET_THUMB_STACK_UNWIND): New define. (INITIAL_ELIMINATION_OFFSET): Add support for TARGET_THUMB_STACK_UNWIND. * config/arm/arm.opt: Add compile-time option THUMB_FP. --- gcc/config/arm/arm.c | 206 +++++++++++++++++++++++++++++++++++++++++++------ gcc/config/arm/arm.h | 10 ++- gcc/config/arm/arm.opt | 4 + 3 files changed, 191 insertions(+), 29 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c081216d..344b715 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -20583,6 +20583,70 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask) stack_pointer_rtx, stack_pointer_rtx); } +/* This function hepls to handle situation when we need to make + multiple pop, but does not have lr or pc registers inside + saved_regs_mask and can not add cfa_notes. + This function is based on arm_emit_multi_reg_pop. */ +static void +arm_emit_multi_reg_pop_no_return (unsigned long saved_regs_mask) +{ + int num_regs = 0; + int i, j; + rtx par; + rtx dwarf = NULL_RTX; + rtx tmp, reg; + int emit_update = 1; + + for (i = 0; i <= LAST_ARM_REGNUM; i++) + if (saved_regs_mask & (1 << i)) + num_regs++; + + gcc_assert (num_regs && num_regs <= 16); + + /* If SP is in reglist, then we don't emit SP update insn. */ + emit_update = (saved_regs_mask & (1 << SP_REGNUM)) ? 0 : 1; + + /* The parallel needs to hold num_regs SETs + and one SET for the stack update. */ + par = gen_rtx_PARALLEL (VOIDmode, + rtvec_alloc (num_regs + emit_update)); + if (emit_update) + { + /* Increment the stack pointer, based on there being + num_regs 4-byte registers to restore. */ + tmp + = gen_rtx_SET (stack_pointer_rtx, + plus_constant (Pmode, stack_pointer_rtx, 4 * num_regs)); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (par, 0, 0) = tmp; + } + + for (j = 0, i = 0; j < num_regs; i++) + if (saved_regs_mask & (1 << i)) + { + reg = gen_rtx_REG (SImode, i); + if ((num_regs == 1) && emit_update) + { + /* Emit single load with writeback. */ + tmp = gen_frame_mem (SImode, + gen_rtx_POST_INC (Pmode, stack_pointer_rtx)); + tmp = emit_insn (gen_rtx_SET (reg, tmp)); + REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); + return; + } + + tmp = gen_rtx_SET ( + reg, gen_frame_mem (SImode, + plus_constant (Pmode, stack_pointer_rtx, 4 * j))); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (par, 0, j + emit_update) = tmp; + dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); + j++; + } + + par = emit_insn (par); +} + /* Generate and emit an insn pattern that we will recognize as a pop_multi of NUM_REGS consecutive VFP regs, starting at FIRST_REG. @@ -21233,6 +21297,10 @@ arm_compute_initial_elimination_offset (unsigned int from, unsigned int to) switch (to) { case THUMB_HARD_FRAME_POINTER_REGNUM: + if (TARGET_THUMB_STACK_UNWIND) + /* In this case the hard frame pointer points to the lr in the stack + frame. So offset is frame - saved_args. */ + return offsets->frame - offsets->saved_args; return 0; case FRAME_POINTER_REGNUM: @@ -21259,6 +21327,11 @@ arm_compute_initial_elimination_offset (unsigned int from, unsigned int to) switch (to) { case THUMB_HARD_FRAME_POINTER_REGNUM: + if (TARGET_THUMB_STACK_UNWIND) + /* The hard frame_pointer points to the top entry in the + stack frame. The soft frame pointer points to the bottom + entry in the stack. */ + return offsets->frame - offsets->soft_frame; return 0; case ARM_HARD_FRAME_POINTER_REGNUM: @@ -21832,7 +21905,7 @@ arm_expand_prologue (void) if ((func_type == ARM_FT_ISR || func_type == ARM_FT_FIQ) && (live_regs_mask & (1 << LR_REGNUM)) != 0 && !(frame_pointer_needed && TARGET_APCS_FRAME) - && TARGET_ARM) + && (TARGET_ARM || TARGET_THUMB_STACK_UNWIND)) { rtx lr = gen_rtx_REG (SImode, LR_REGNUM); @@ -21879,20 +21952,57 @@ arm_expand_prologue (void) else { insn = emit_multi_reg_push (live_regs_mask, live_regs_mask); - RTX_FRAME_RELATED_P (insn) = 1; - } - } + RTX_FRAME_RELATED_P (insn) = 1; + } + } else - { - insn = emit_multi_reg_push (live_regs_mask, dwarf_regs_mask); - RTX_FRAME_RELATED_P (insn) = 1; - } + { + /* In case we want to have frame pointer register to follow + lr register on the stack, we need to modify stack layout. + For example we have to save on the stack following registers + {r4, ... r11, lr}, but with thumb mode frame pointer + register is r7, so this layout is not correct. We will modify it + from + push {r4, ... r11, lr} + to + push {r4, ... r7, lr} + push {r8, ..., r11} + Registers should be pushed in ascending order. */ + if (TARGET_THUMB_STACK_UNWIND && (live_regs_mask & (1 << 8)) + /* Check for non-leaf fucntion. */ + && (live_regs_mask & (1 << LR_REGNUM))) + { + unsigned long first_regs_mask, dwarf_first_regs_mask, + second_regs_mask, dwarf_second_regs_mask; + + /* Save info about first 8 registers from r0 to r7 and + add lr for this mask. */ + first_regs_mask = dwarf_first_regs_mask + = (live_regs_mask & 0xff) | (1 << LR_REGNUM); + /* Add all left registers from r8 and delete lr. */ + second_regs_mask = dwarf_second_regs_mask + = (live_regs_mask & ~0xff) & ~(1 << LR_REGNUM); + insn + = emit_multi_reg_push (first_regs_mask, dwarf_first_regs_mask); + RTX_FRAME_RELATED_P (insn) = 1; + insn = emit_multi_reg_push (second_regs_mask, + dwarf_second_regs_mask); + RTX_FRAME_RELATED_P (insn) = 1; + } + else + { + insn = emit_multi_reg_push (live_regs_mask, dwarf_regs_mask); + RTX_FRAME_RELATED_P (insn) = 1; + } + } } if (! IS_VOLATILE (func_type)) saved_regs += arm_save_coproc_regs (); - if (frame_pointer_needed && TARGET_ARM) + /* Set frame pointer register (r7) to lr on the stack + for TARGET_THUMB_STACK_UNWIND. */ + if (frame_pointer_needed && (TARGET_ARM || TARGET_THUMB_STACK_UNWIND)) { /* Create the new frame pointer. */ if (TARGET_APCS_FRAME) @@ -21983,7 +22093,7 @@ arm_expand_prologue (void) } - if (frame_pointer_needed && TARGET_THUMB2) + if (frame_pointer_needed && (TARGET_THUMB2 && !TARGET_THUMB_FP)) thumb_set_frame_pointer (offsets); if (flag_pic && arm_pic_register != INVALID_REGNUM) @@ -25441,11 +25551,26 @@ thumb2_expand_return (bool simple_return) emit_jump_insn (par); } else - { - saved_regs_mask &= ~ (1 << LR_REGNUM); - saved_regs_mask |= (1 << PC_REGNUM); - arm_emit_multi_reg_pop (saved_regs_mask); - } + { + /* We have made multiple push for thumb, when frame pointer + is needed. So, make multiple pop in the same order. */ + if (TARGET_THUMB_STACK_UNWIND && (saved_regs_mask & (1 << 8))) + { + unsigned long first_regs_mask = saved_regs_mask & 0xff; + unsigned long second_regs_mask = saved_regs_mask & ~0xff; + + first_regs_mask |= (1 << PC_REGNUM); + second_regs_mask &= ~((1 << PC_REGNUM) | (1 << LR_REGNUM)); + arm_emit_multi_reg_pop_no_return (second_regs_mask); + arm_emit_multi_reg_pop (first_regs_mask); + } + else + { + saved_regs_mask &= ~(1 << LR_REGNUM); + saved_regs_mask |= (1 << PC_REGNUM); + arm_emit_multi_reg_pop (saved_regs_mask); + } + } } else { @@ -25724,7 +25849,7 @@ arm_expand_epilogue (bool really_return) rtx_insn *insn; /* Restore stack pointer if necessary. */ if (TARGET_ARM) - { + { /* In ARM mode, frame pointer points to first saved register. Restore stack pointer to last saved register. */ amount = offsets->frame - offsets->saved_regs; @@ -25745,9 +25870,15 @@ arm_expand_epilogue (bool really_return) } else { - /* In Thumb-2 mode, the frame pointer points to the last saved - register. */ - amount = offsets->locals_base - offsets->saved_regs; + /* For TARGET_THUMB_STACK_UNWIND the frame pointer points to the + bottom of the stack. */ + if (TARGET_THUMB_STACK_UNWIND) + amount = offsets->frame - offsets->saved_regs; + else + /* In Thumb-2 mode, the frame pointer points to the last saved + register. */ + amount = offsets->locals_base - offsets->saved_regs; + if (amount) { insn = emit_insn (gen_addsi3 (hard_frame_pointer_rtx, @@ -25896,10 +26027,10 @@ arm_expand_epilogue (bool really_return) } else { - if (TARGET_LDRD + if (TARGET_LDRD && current_tune->prefer_ldrd_strd - && !optimize_function_for_size_p (cfun)) - { + && !optimize_function_for_size_p (cfun)) + { if (TARGET_THUMB2) thumb2_emit_ldrd_pop (saved_regs_mask); else if (TARGET_ARM && !IS_INTERRUPT (func_type)) @@ -25907,9 +26038,34 @@ arm_expand_epilogue (bool really_return) else arm_emit_multi_reg_pop (saved_regs_mask); } - else - arm_emit_multi_reg_pop (saved_regs_mask); - } + else + { + /* Make multiple pop in the same order as we done multiple push. + */ + if (TARGET_THUMB_STACK_UNWIND && (saved_regs_mask & (1 << 8)) + && ((saved_regs_mask & (1 << LR_REGNUM)) + || (saved_regs_mask & (1 << PC_REGNUM)))) + { + unsigned long first_regs_mask = saved_regs_mask & 0xff; + unsigned long second_regs_mask = saved_regs_mask & ~0xff; + + if (saved_regs_mask & (1 << PC_REGNUM)) + { + first_regs_mask |= (1 << PC_REGNUM); + second_regs_mask &= ~(1 << PC_REGNUM); + } + else + { + first_regs_mask |= (1 << LR_REGNUM); + second_regs_mask &= ~(1 << LR_REGNUM); + } + arm_emit_multi_reg_pop_no_return (second_regs_mask); + arm_emit_multi_reg_pop (first_regs_mask); + } + else + arm_emit_multi_reg_pop (saved_regs_mask); + } + } if (return_in_pc) return; diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 34894c0..3e87064 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -117,6 +117,8 @@ extern tree arm_fp16_type_node; #define TARGET_THUMB1_P(flags) (TARGET_THUMB_P (flags) && !arm_arch_thumb2) #define TARGET_THUMB2_P(flags) (TARGET_THUMB_P (flags) && arm_arch_thumb2) #define TARGET_32BIT_P(flags) (TARGET_ARM_P (flags) || TARGET_THUMB2_P (flags)) +#define TARGET_THUMB_STACK_UNWIND \ + (TARGET_THUMB2 && TARGET_THUMB_FP && frame_pointer_needed && arm_arch7) /* Run-time Target Specification. */ /* Use hardware floating point instructions. */ @@ -1564,10 +1566,10 @@ typedef struct /* Define the offset between two registers, one to be eliminated, and the other its replacement, at the start of a routine. */ -#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \ - if (TARGET_ARM) \ - (OFFSET) = arm_compute_initial_elimination_offset (FROM, TO); \ - else \ +#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \ + if (TARGET_ARM || TARGET_THUMB_STACK_UNWIND) \ + (OFFSET) = arm_compute_initial_elimination_offset (FROM, TO); \ + else \ (OFFSET) = thumb_compute_initial_elimination_offset (FROM, TO) /* Special case handling of the location of arguments passed on the stack. */ diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt index a1286a4..49e217f 100644 --- a/gcc/config/arm/arm.opt +++ b/gcc/config/arm/arm.opt @@ -198,6 +198,10 @@ mthumb Target Report RejectNegative Negative(marm) Mask(THUMB) Save Generate code for Thumb state. +mthumb-fp +Target Report Mask(THUMB_FP) +Generate frame pointer which points to the lr register on the stack. + mthumb-interwork Target Report Mask(INTERWORK) Support calls between Thumb and ARM instruction sets. -- 1.9.1