Christophe Lyon <christophe.l...@st.com> writes: > The FDPIC register is hard-coded to r9, as defined in the ABI. > > We have to disable tailcall optimizations if we don't know if the > target function is in the same module. If not, we have to set r9 to > the value associated with the target module. > > When generating a symbol address, we have to take into account whether > it is a pointer to data or to a function, because different > relocations are needed. > > 2019-XX-XX Christophe Lyon <christophe.l...@st.com> > Mickaël Guêné <mickael.gu...@st.com> > > * config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro > in FDPIC mode. > * config/arm/arm-protos.h (arm_load_function_descriptor): Declare > new function. > * config/arm/arm.c (arm_option_override): Define pic register to > FDPIC_REGNUM. > (arm_function_ok_for_sibcall): Disable sibcall optimization if we > have no decl or go through PLT. > (arm_load_pic_register): Handle TARGET_FDPIC. > (arm_is_segment_info_known): New function. > (arm_pic_static_addr): Add support for FDPIC. > (arm_load_function_descriptor): New function. > (arm_assemble_integer): Add support for FDPIC. > * config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED): > Define. (FDPIC_REGNUM): New define. > * config/arm/arm.md (call): Add support for FDPIC. > (call_value): Likewise. > (*restore_pic_register_after_call): New pattern. > (untyped_call): Disable if FDPIC. > (untyped_return): Likewise. > * config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New. > > Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4 > > diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c > index 6e256ee..34695fa 100644 > --- a/gcc/config/arm/arm-c.c > +++ b/gcc/config/arm/arm-c.c > @@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile) > builtin_define ("__ARM_EABI__"); > } > > + def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC); > + > def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV); > def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV); > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 485bc68..272968a 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void); > extern int arm_const_double_inline_cost (rtx); > extern bool arm_const_double_by_parts (rtx); > extern bool arm_const_double_by_immediates (rtx); > +extern rtx arm_load_function_descriptor (rtx funcdesc); > extern void arm_emit_call_insn (rtx, rtx, bool); > bool detect_cmse_nonsecure_call (tree); > extern const char *output_call (rtx *); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 45abcd8..d9397b5 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -3485,6 +3485,15 @@ arm_option_override (void) > if (flag_pic && TARGET_VXWORKS_RTP) > arm_pic_register = 9; > > + /* If in FDPIC mode then force arm_pic_register to be r9. */ > + if (TARGET_FDPIC) > + { > + arm_pic_register = FDPIC_REGNUM; > + if (! TARGET_ARM && ! TARGET_THUMB2) > + sorry ("FDPIC mode is supported on architecture versions that " > + "support ARM or Thumb-2 only."); > + } > + > if (arm_pic_register_string != NULL) > { > int pic_register = decode_reg_name (arm_pic_register_string);
Isn't this equivalent to rejecting Thumb-1? I think that would be clearer in both the condition and the error message. How does this interact with arm_pic_data_is_text_relative? Are both values supported? > @@ -7295,6 +7304,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp) > if (cfun->machine->sibcall_blocked) > return false; > > + if (TARGET_FDPIC) > + { > + /* In FDPIC, never tailcall something for which we have no decl: > + the target function could be in a different module, requiring > + a different FDPIC register value. */ > + if (decl == NULL) > + return false; > + > + /* Don't tailcall if we go through the PLT since the FDPIC > + register is then corrupted and we don't restore it after > + static function calls. */ > + if (!targetm.binds_local_p (decl)) > + return false; > + } > + > /* Never tailcall something if we are generating code for Thumb-1. */ > if (TARGET_THUMB1) > return false; > @@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs > ATTRIBUTE_UNUSED, rtx pic_reg) > { > rtx l1, labelno, pic_tmp, pic_rtx; > > - if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE) > + if (crtl->uses_pic_offset_table == 0 > + || TARGET_SINGLE_PIC_BASE > + || TARGET_FDPIC) > return; > > gcc_assert (flag_pic); > @@ -7780,28 +7806,142 @@ arm_load_pic_register (unsigned long saved_regs > ATTRIBUTE_UNUSED, rtx pic_reg) > emit_use (pic_reg); > } > > +/* Try to determine whether an object, referenced via ORIG, will be > + placed in the text or data segment. This is used in FDPIC mode, to > + decide which relocations to use when accessing ORIG. IS_READONLY > + is set to true if ORIG is a read-only location, false otherwise. > + Return true if we could determine the location of ORIG, false > + otherwise. IS_READONLY is valid only when we return true. */ Maybe *IS_READONLY in both cases? > +static bool > +arm_is_segment_info_known (rtx orig, bool *is_readonly) > +{ > + bool res = false; > + > + *is_readonly = false; > + > + if (GET_CODE (orig) == LABEL_REF) > + { > + res = true; > + *is_readonly = true; > + } Think this function would be easier to read with early returns. > + else if (SYMBOL_REF_P (orig)) ...so "if" rather than "else if" here. > + { > + if (CONSTANT_POOL_ADDRESS_P (orig)) > + { > + res = true; > + *is_readonly = true; > + } > + else if (SYMBOL_REF_LOCAL_P (orig) > + && !SYMBOL_REF_EXTERNAL_P (orig) > + && SYMBOL_REF_DECL (orig) > + && (!DECL_P (SYMBOL_REF_DECL (orig)) > + || !DECL_COMMON (SYMBOL_REF_DECL (orig)))) > + { > + tree decl = SYMBOL_REF_DECL (orig); > + tree init = (TREE_CODE (decl) == VAR_DECL) > + ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR) > + ? decl : 0; > + int reloc = 0; > + bool named_section, readonly; > + > + if (init && init != error_mark_node) > + reloc = compute_reloc_for_constant (init); > + > + named_section = TREE_CODE (decl) == VAR_DECL > + && lookup_attribute ("section", DECL_ATTRIBUTES (decl)); Here too I think it would be better to return false early. How much variation do you support here for named sections? E.g. can a linker script really put SECTION_WRITE sections in the text segment? Seems like there are some cases that could be handled. (Just asking, not suggesting you should change anything.) > + readonly = decl_readonly_section (decl, reloc); > + > + /* We don't know where the link script will put a named > + section, so return false in such a case. */ > + res = !named_section; > + > + if (!named_section) > + *is_readonly = readonly; > + } > + else > + { > + /* We don't know. */ > + res = false; > + } > + } > + else > + gcc_unreachable (); > + > + return res; > +} > + > /* Generate code to load the address of a static var when flag_pic is set. > */ > static rtx_insn * > arm_pic_static_addr (rtx orig, rtx reg) > { > rtx l1, labelno, offset_rtx; > + rtx_insn *insn; > > gcc_assert (flag_pic); > > - /* We use an UNSPEC rather than a LABEL_REF because this label > - never appears in the code stream. */ > - labelno = GEN_INT (pic_labelno++); > - l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL); > - l1 = gen_rtx_CONST (VOIDmode, l1); > + bool is_readonly = false; > + bool info_known = false; > > - /* On the ARM the PC register contains 'dot + 8' at the time of the > - addition, on the Thumb it is 'dot + 4'. */ > - offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4); > - offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx), > - UNSPEC_SYMBOL_OFFSET); > - offset_rtx = gen_rtx_CONST (Pmode, offset_rtx); > + if (TARGET_FDPIC > + && SYMBOL_REF_P (orig) > + && !SYMBOL_REF_FUNCTION_P (orig)) > + info_known = arm_is_segment_info_known (orig, &is_readonly); Excess indendentation. Feels like it might be slightly simpler to handle SYMBOL_REF_FUNCTION_P in arm_is_segment_info_known, but I guess the idea is that it might not then be clear whether the caller is asking about a descriptor or the function itself. > > - return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno)); > + if (TARGET_FDPIC > + && SYMBOL_REF_P (orig) > + && !SYMBOL_REF_FUNCTION_P (orig) > + && !info_known) > + { > + /* We don't know where orig is stored, so we have be > + pessimistic and use a GOT relocation. */ > + rtx pat; > + rtx mem; > + rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM); > + > + pat = gen_calculate_pic_address (reg, pic_reg, orig); > + > + /* Make the MEM as close to a constant as possible. */ > + mem = SET_SRC (pat); > + gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem)); > + MEM_READONLY_P (mem) = 1; > + MEM_NOTRAP_P (mem) = 1; > + > + insn = emit_insn (pat); Think "pat = ..." onwards should be split out into a helper, since it's a cut-&-paste of the code in legitimize_pic_address. > + } > + else if (TARGET_FDPIC > + && SYMBOL_REF_P (orig) > + && (SYMBOL_REF_FUNCTION_P (orig) > + || (info_known && !is_readonly))) > + { > + /* We use the GOTOFF relocation. */ > + rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM); > + > + rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM); > + emit_insn (gen_movsi (reg, l1)); > + insn = emit_insn (gen_addsi3 (reg, reg, pic_reg)); > + } > + else > + { > + /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use > + PC-relative access. */ > + /* We use an UNSPEC rather than a LABEL_REF because this label > + never appears in the code stream. */ > + labelno = GEN_INT (pic_labelno++); > + l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL); > + l1 = gen_rtx_CONST (VOIDmode, l1); > + > + /* On the ARM the PC register contains 'dot + 8' at the time of the > + addition, on the Thumb it is 'dot + 4'. */ > + offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4); > + offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx), > + UNSPEC_SYMBOL_OFFSET); > + offset_rtx = gen_rtx_CONST (Pmode, offset_rtx); > + > + insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, > + labelno)); > + } > + > + return insn; > } > > /* Return nonzero if X is valid as an ARM state addressing register. */ > @@ -16112,9 +16252,36 @@ get_jump_table_size (rtx_jump_table_data *insn) > return 0; > } > > +/* Emit insns to load the function address from FUNCDESC (an FDPIC > + function descriptor) into a register and the GOT address into the > + FDPIC register, returning an rtx for the register holding the > + function address. */ > + > +rtx > +arm_load_function_descriptor (rtx funcdesc) > +{ > + rtx fnaddr_reg = gen_reg_rtx (Pmode); > + rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM); > + rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc); > + rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4)); > + rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3)); > + > + emit_move_insn (fnaddr_reg, fnaddr); > + /* The ABI requires the entry point address to be loaded first, so > + prevent the load from being moved after that of the GOT > + address. */ Do you mean that the move insn above has to come before the pattern below? If so, I think that should be enforced by making this... > + XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode, > + gen_rtvec (2, pic_reg, gotaddr), > + UNSPEC_PIC_RESTORE); > + XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gotaddr); > + XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, pic_reg); > + emit_insn (par); > + > + return fnaddr_reg; > +} > + ...use fnaddr_reg. Does the instruction actually use pic_reg? We only get here for non-symbolic addresses after all. It seems simpler to make *restore_pic_register_after_call a named pattern and use gen_restore_pic_register_after_call instead. > /* Return the maximum amount of padding that will be inserted before > label LABEL. */ > - > static HOST_WIDE_INT > get_label_padding (rtx label) > { > @@ -23069,9 +23236,37 @@ arm_assemble_integer (rtx x, unsigned int size, int > aligned_p) > && (!SYMBOL_REF_LOCAL_P (x) > || (SYMBOL_REF_DECL (x) > ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0)))) > - fputs ("(GOT)", asm_out_file); > + { > + if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x)) > + fputs ("(GOTFUNCDESC)", asm_out_file); > + else > + fputs ("(GOT)", asm_out_file); > + } > else > - fputs ("(GOTOFF)", asm_out_file); > + { > + if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x)) > + fputs ("(GOTOFFFUNCDESC)", asm_out_file); > + else > + { > + bool is_readonly; > + > + if (arm_is_segment_info_known (x, &is_readonly)) > + fputs ("(GOTOFF)", asm_out_file); > + else > + fputs ("(GOT)", asm_out_file); > + } > + } > + } > + > + /* For FDPIC we also have to mark symbol for .data section. */ > + if (TARGET_FDPIC > + && NEED_GOT_RELOC > + && flag_pic > + && !making_const_table > + && SYMBOL_REF_P (x)) > + { > + if (SYMBOL_REF_FUNCTION_P (x)) > + fputs ("(FUNCDESC)", asm_out_file); > } > fputc ('\n', asm_out_file); > return true; Do you expect to reach here for LABEL_REFs with TARGET_FDPIC? The second block of code tests for SYMBOL_REF_P but the first tests SYMBOL_REF_FUNCTION_P without checking SYMBOL_REF_P first. Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC? Is !flag_pic TARGET_FDPIC supported? > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 0aecd03..9036255 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -8127,6 +8127,23 @@ > rtx callee, pat; > tree addr = MEM_EXPR (operands[0]); > > + /* Force FDPIC register (r9) before call. */ > + if (TARGET_FDPIC) > + { > + /* No need to update r9 if calling a static function. > + In other words: set r9 for indirect or non-local calls. */ > + callee = XEXP (operands[0], 0); > + if (!SYMBOL_REF_P (callee) > + || !SYMBOL_REF_LOCAL_P (callee) > + || arm_is_long_call_p (SYMBOL_REF_DECL (callee))) IMO it would be better to calculate this once rather than repeat it below. > + { > + emit_insn (gen_blockage ()); Why's the blockage needed? Seems worth a comment. > + rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM); > + emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, > FDPIC_REGNUM)); > + emit_insn (gen_rtx_USE (VOIDmode, pic_reg)); Is this use keeping the register live for the call? If so, I think it'd be better to attach it to the CALL_INSN_FUNCTION_USAGE instead. > + } > + } > + > /* In an untyped call, we can get NULL for operand 2. */ > if (operands[2] == NULL_RTX) > operands[2] = const0_rtx; > @@ -8140,6 +8157,13 @@ > : !REG_P (callee)) > XEXP (operands[0], 0) = force_reg (Pmode, callee); > > + if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0))) > + { > + /* Indirect call: set r9 with FDPIC value of callee. */ > + XEXP (operands[0], 0) > + = arm_load_function_descriptor (XEXP (operands[0], 0)); > + } > + > if (detect_cmse_nonsecure_call (addr)) > { > pat = gen_nonsecure_call_internal (operands[0], operands[1], Redundant braces. > @@ -8151,10 +8175,38 @@ > pat = gen_call_internal (operands[0], operands[1], operands[2]); > arm_emit_call_insn (pat, XEXP (operands[0], 0), false); > } > + > + /* Restore FDPIC register (r9) after call. */ > + if (TARGET_FDPIC) > + { > + /* No need to update r9 if calling a static function. */ > + if (!SYMBOL_REF_P (callee) > + || !SYMBOL_REF_LOCAL_P (callee) > + || arm_is_long_call_p (SYMBOL_REF_DECL (callee))) > + { > + rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM); > + emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, > FDPIC_REGNUM)); > + emit_insn (gen_rtx_USE (VOIDmode, pic_reg)); > + emit_insn (gen_blockage ()); > + } > + } > DONE; > }" > ) What's the general assumption about the validity of r9? Seems odd that we need to load this value both before and after the call. > > +(define_insn "*restore_pic_register_after_call" > + [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r") > + (match_operand:SI 1 "nonimmediate_operand" "r,m")] > + UNSPEC_PIC_RESTORE) > + (use (match_dup 1)) > + (clobber (match_dup 0))]) > + ] > + "" > + "@ > + mov\t%0, %1 > + ldr\t%0, %1" > +) > + > (define_expand "call_internal" > [(parallel [(call (match_operand 0 "memory_operand" "") > (match_operand 1 "general_operand" "")) Since operand 0 is significant after the instruction, I think this should be: (define_insn "*restore_pic_register_after_call" [(set (match_operand:SI 0 "s_register_operand" "+r,r") (unspec:SI [(match_dup 0) (match_operand:SI 1 "nonimmediate_operand" "r,m")] UNSPEC_PIC_RESTORE))] ... The (use (match_dup 1)) looks redundant, since the unspec itself uses operand 1. > @@ -8215,6 +8267,30 @@ > rtx pat, callee; > tree addr = MEM_EXPR (operands[1]); > > + /* Force FDPIC register (r9) before call. */ > + if (TARGET_FDPIC) > + { > + /* No need to update the FDPIC register (r9) if calling a static > function. > + In other words: set r9 for indirect or non-local calls. */ > + callee = XEXP (operands[1], 0); > + if (!SYMBOL_REF_P (callee) > + || !SYMBOL_REF_LOCAL_P (callee) > + || arm_is_long_call_p (SYMBOL_REF_DECL (callee))) > + { > + rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3)); > + rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM); > + rtx initial_fdpic_reg = > + get_hard_reg_initial_val (Pmode, FDPIC_REGNUM); > + > + XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode, > + gen_rtvec (2, fdpic_reg, initial_fdpic_reg), > + UNSPEC_PIC_RESTORE); > + XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg); > + XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg); > + emit_insn (par); > + } > + } > + It's not obvious why this code is different from the call-without-value case above, which doesn't use UNSPEC_PIC_RESTORE. I think it should be split out into a helper function that's used for both call and call_value. I think it would also be good to have more comments about what conditions the UNSPEC_PIC_RESTORE pattern is enforcing. Thanks, Richard