On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add -mindirect-branch= option to convert indirect call and jump to call
> and return thunks.  The default is 'keep', which keeps indirect call and
> jump unmodified.  'thunk' converts indirect call and jump to call and
> return thunk.  'thunk-inline' converts indirect call and jump to inlined
> call and return thunk.  'thunk-extern' converts indirect call and jump to
> external call and return thunk provided in a separate object file.  You
> can control this behavior for a specific function by using the function
> attribute indirect_branch.
> 
> 2 kinds of thunks are geneated.  Memory thunk where the function address
> is at the top of the stack:
> 
> __x86_indirect_thunk:
>       call L2
> L1:
>       lfence
>       jmp L1
> L2:
>       lea 8(%rsp), %rsp|lea 4(%esp), %esp
>       ret
> 
> Indirect jmp via memory, "jmp mem", is converted to
> 
>       push memory
>       jmp __x86_indirect_thunk
> 
> Indirect call via memory, "call mem", is converted to
> 
>       jmp L2
> L1:
>       push [mem]
>       jmp __x86_indirect_thunk
> L2:
>       call L1
> 
> Register thunk where the function address is in a register, reg:
> 
> __x86_indirect_thunk_reg:
>       call    L2
> L1:
>       lfence
>       jmp     L1
> L2:
>       movq    %reg, (%rsp)|movl    %reg, (%esp)
>       ret
> 
> where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
> (r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.
> 
> Indirect jmp via register, "jmp reg", is converted to
> 
>       jmp __x86_indirect_thunk_reg
> 
> Indirect call via register, "call reg", is converted to
> 
>       call __x86_indirect_thunk_reg
> 
> gcc/
> 
>       * config/i386/i386-opts.h (indirect_branch): New.
>       * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>       * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>       with local indirect jump when converting indirect call and jump.
>       (ix86_set_indirect_branch_type): New.
>       (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>       (indirectlabelno): New.
>       (indirect_thunk_needed): Likewise.
>       (indirect_thunk_bnd_needed): Likewise.
>       (indirect_thunks_used): Likewise.
>       (indirect_thunks_bnd_used): Likewise.
>       (INDIRECT_LABEL): Likewise.
>       (indirect_thunk_name): Likewise.
>       (output_indirect_thunk): Likewise.
>       (output_indirect_thunk_function): Likewise.
>       (ix86_output_indirect_branch): Likewise.
>       (ix86_output_indirect_jmp): Likewise.
>       (ix86_code_end): Call output_indirect_thunk_function if needed.
>       (ix86_output_call_insn): Call ix86_output_indirect_branch if
>       needed.
>       (ix86_handle_fndecl_attribute): Handle indirect_branch.
>       (ix86_attribute_table): Add indirect_branch.
>       * config/i386/i386.h (machine_function): Add indirect_branch_type
>       and has_local_indirect_jump.
>       * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>       to true.
>       (tablejump): Likewise.
>       (*indirect_jump): Use ix86_output_indirect_jmp.
>       (*tablejump_1): Likewise.
>       (simple_return_indirect_internal): Likewise.
>       * config/i386/i386.opt (mindirect-branch=): New option.
>       (indirect_branch): New.
>       (keep): Likewise.
>       (thunk): Likewise.
>       (thunk-inline): Likewise.
>       (thunk-extern): Likewise.
>       * doc/extend.texi: Document indirect_branch function attribute.
>       * doc/invoke.texi: Document -mindirect-branch= option.
Note I'm expecting Uros to chime in.  So please do not consider this
ack'd until you hear from Uros.

At a high level is there really that much value in having thunks in the
object file?  Why not put the full set of thunks into libgcc and just
allow selection between inline sequences and external thunks
(thunk-inline and thunk-external)?  It's not a huge simplification, but
if there isn't a compelling reason, let's drop the in-object-file thunks.

> +
> +/* Fills in the label name that should be used for the indirect thunk.  */
> +
> +static void
> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
Please document each argument in the function's comment.



> +
> +static void
> +output_indirect_thunk (bool need_bnd_p, int regno)
Needs a function comment.



> +
> +static void
> +output_indirect_thunk_function (bool need_bnd_p, int regno)
Needs a function comment.



> @@ -28119,12 +28357,182 @@ ix86_nopic_noplt_attribute_p (rtx call_op)
>    return false;
>  }
>  
> +static void
> +ix86_output_indirect_branch (rtx call_op, const char *xasm,
> +                          bool sibcall_p)
Needs a function comment.


I'd probably break this into a few smaller functions.  It's a lot of
inlined code.







> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 3f587806407..a7573c468ae 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -12313,12 +12313,13 @@
>  {
>    if (TARGET_X32)
>      operands[0] = convert_memory_address (word_mode, operands[0]);
> +  cfun->machine->has_local_indirect_jump = true;
Note this is not ideal in that it's set at expansion time and thus would
not be accurate if the RTL optimizers were able to simply things enough
such that the indirect jump has a known target.

But I wouldn't expect that to happen much in the RTL optimizers much as
the gimple optimizers are likely much better at doing that kind of
thing.  So I won't object to doing things this way as long as they
gracefully handle this case.


> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 09aaa97c2fc..22c806206e4 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1021,3 +1021,23 @@ indirect jump.
>  mforce-indirect-call
>  Target Report Var(flag_force_indirect_call) Init(0)
>  Make all function calls indirect.
> +
> +mindirect-branch=
> +Target Report RejectNegative Joined Enum(indirect_branch) 
> Var(ix86_indirect_branch) Init(indirect_branch_keep)
> +Convert indirect call and jump.
Convert to what?  I realize there's an enum of the choices below, but
this doesn't read well.

Do you want to mention that CET and retpolines are inherently
incompatible?  Should an attempt to use them together generate a
compile-time error?

Jeff

Reply via email to