On Thu, 4 Jan 2024, Jan Hubicka wrote: > Hi, > this patch adds new option -falign-all-functions which works like > -falign-functions, but applies to all functions including those in cold > regions. As discussed in the PR log, this is needed for atomically > patching function entries in the kernel. > > An option would be to make -falign-function mandatory, but I think it is not a > good idea, since original purpose of -falign-funtions is optimization of > instruction decode and cache size. Having -falign-all-functions is > backwards compatible. Richi also suggested extending syntax of the > -falign-functions parameters (which is already non-trivial) but it seems > to me that having separate flag is more readable. > > Bootstrapped/regtested x86_64-linux, OK for master and later > backports to release branches? > > gcc/ChangeLog: > > PR middle-end/88345 > * common.opt: Add -falign-all-functions > * doc/invoke.texi: Add -falign-all-functions. > (-falign-functions, -falign-labels, -falign-loops): Document > that alignment is ignored in cold code. > * flags.h (align_loops): Reindent. > (align_jumps): Reindent. > (align_labels): Reindent. > (align_functions): Reindent. > (align_all_functions): New macro. > * opts.cc (common_handle_option): Handle -falign-all-functions. > * toplev.cc (parse_alignment_opts): Likewise. > * varasm.cc (assemble_start_function): Likewise. > > diff --git a/gcc/common.opt b/gcc/common.opt > index d263a959df3..fea2c855fcf 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1033,6 +1033,13 @@ faggressive-loop-optimizations > Common Var(flag_aggressive_loop_optimizations) Optimization Init(1) > Aggressively optimize loops using language constraints. > > +falign-all-functions > +Common Var(flag_align_all_functions) Optimization > +Align the start of functions.
all functions or maybe "of every function."? > + > +falign-all-functions= > +Common RejectNegative Joined Var(str_align_all_functions) Optimization > + > falign-functions > Common Var(flag_align_functions) Optimization > Align the start of functions. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index d272b9228dd..ad3d75d310c 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}. > @xref{Optimize Options,,Options that Control Optimization}. > @gccoptlist{-faggressive-loop-optimizations > -falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] > +-falign-all-functions=[@var{n}] > -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] > -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] > -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]] > @@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two > greater than or > equal to @var{n}, skipping up to @var{m}-1 bytes. This ensures that at > least the first @var{m} bytes of the function can be fetched by the CPU > without crossing an @var{n}-byte alignment boundary. > +This is an optimization of code performance and alignment is ignored for > +functions considered cold. If alignment is required for all functions, > +use @option{-falign-all-functions}. > > If @var{m} is not specified, it defaults to @var{n}. > > @@ -14210,6 +14214,12 @@ overaligning functions. It attempts to instruct the > assembler to align > by the amount specified by @option{-falign-functions}, but not to > skip more bytes than the size of the function. > > +@opindex falign-all-functions=@var{n} > +@item -falign-all-functions > +Specify minimal alignment for function entry. Unlike > @option{-falign-functions} > +this alignment is applied also to all functions (even those considered cold). > +The alignment is also not affected by @option{-flimit-function-alignment} > + For functions with two entries (like on powerpc), which entry does this apply to? I suppose the external ABI entry, not the local one? But how does this then help to align the patchable entry (the common local entry should be aligned?). Should we align _both_ entries? > @opindex falign-labels > @item -falign-labels > @itemx -falign-labels=@var{n} > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}. > Align loops to a power-of-two boundary. If the loops are executed > many times, this makes up for any execution of the dummy padding > instructions. > +This is an optimization of code performance and alignment is ignored for > +loops considered cold. > > If @option{-falign-labels} is greater than this value, then its value > is used instead. > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}. > Align branch targets to a power-of-two boundary, for branch targets > where the targets can only be reached by jumping. In this case, > no dummy operations need be executed. > +This is an optimization of code performance and alignment is ignored for > +jumps considered cold. > > If @option{-falign-labels} is greater than this value, then its value > is used instead. > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} and > optimization > options should be specified at compile time and during the final link. > It is recommended that you compile all the files participating in the > same link with the same options and also specify those options at > -link time. > +link time. > For example: > > @smallexample > diff --git a/gcc/flags.h b/gcc/flags.h > index e4bafa310d6..ecf4fb9e846 100644 > --- a/gcc/flags.h > +++ b/gcc/flags.h > @@ -89,6 +89,7 @@ public: > align_flags x_align_jumps; > align_flags x_align_labels; > align_flags x_align_functions; > + align_flags x_align_all_functions; > }; > > extern class target_flag_state default_target_flag_state; > @@ -98,10 +99,11 @@ extern class target_flag_state *this_target_flag_state; > #define this_target_flag_state (&default_target_flag_state) > #endif > > -#define align_loops (this_target_flag_state->x_align_loops) > -#define align_jumps (this_target_flag_state->x_align_jumps) > -#define align_labels (this_target_flag_state->x_align_labels) > -#define align_functions (this_target_flag_state->x_align_functions) > +#define align_loops (this_target_flag_state->x_align_loops) > +#define align_jumps (this_target_flag_state->x_align_jumps) > +#define align_labels (this_target_flag_state->x_align_labels) > +#define align_functions (this_target_flag_state->x_align_functions) > +#define align_all_functions (this_target_flag_state->x_align_all_functions) > > /* Returns TRUE if generated code should match ABI version N or > greater is in use. */ > diff --git a/gcc/opts.cc b/gcc/opts.cc > index 7a3830caaa3..3fa521501ff 100644 > --- a/gcc/opts.cc > +++ b/gcc/opts.cc > @@ -3342,6 +3342,12 @@ common_handle_option (struct gcc_options *opts, > &opts->x_str_align_functions); > break; > > + case OPT_falign_all_functions_: > + check_alignment_argument (loc, arg, "all-functions", > + &opts->x_flag_align_all_functions, > + &opts->x_str_align_all_functions); > + break; > + > case OPT_ftabstop_: > /* It is documented that we silently ignore silly values. */ > if (value >= 1 && value <= 100) > diff --git a/gcc/toplev.cc b/gcc/toplev.cc > index 85450d97a1a..3dd6f4e1ef7 100644 > --- a/gcc/toplev.cc > +++ b/gcc/toplev.cc > @@ -1219,6 +1219,7 @@ parse_alignment_opts (void) > parse_N_M (str_align_jumps, align_jumps); > parse_N_M (str_align_labels, align_labels); > parse_N_M (str_align_functions, align_functions); > + parse_N_M (str_align_all_functions, align_all_functions); > } > > /* Process the options that have been parsed. */ > diff --git a/gcc/varasm.cc b/gcc/varasm.cc > index 69f8f8ee018..ddb8536a337 100644 > --- a/gcc/varasm.cc > +++ b/gcc/varasm.cc > @@ -1919,6 +1919,37 @@ assemble_start_function (tree decl, const char *fnname) > ASM_OUTPUT_ALIGN (asm_out_file, align); > } > > + /* Handle forced alignment. This really ought to apply to all functions, > + since it is used by patchable entries. */ > + if (align_all_functions.levels[0].log > align) > + { > +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN > + int align_log = align_all_functions.levels[0].log; > +#endif > + int max_skip = align_all_functions.levels[0].maxskip; > + if (flag_limit_function_alignment && crtl->max_insn_address > 0 > + && max_skip >= crtl->max_insn_address) > + max_skip = crtl->max_insn_address - 1; > + > +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN > + ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip); > + if (max_skip >= (1 << align_log) - 1) > + align = align_functions.levels[0].log; > + if (max_skip == align_all_functions.levels[0].maxskip) > + { > + ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, > + align_all_functions.levels[1].log, > + align_all_functions.levels[1].maxskip); > + if (align_all_functions.levels[1].maxskip > + >= (1 << align_all_functions.levels[1].log) - 1) > + align = align_all_functions.levels[1].log; > + } > +#else > + ASM_OUTPUT_ALIGN (asm_out_file, align_all_functions.levels[0].log); > + align = align_all_functions.levels[0].log; > +#endif This looks close to the handling of user-specified alignment (factor sth out?), but I wonder if for -falign-all-functions we should only allow a hard alignment (no max_skip) and also not allow (but diagnose?) conflicts with limit-function-alignment? The interaction with the other flags also doesn't seem to be well documented? The main docs suggest it should be -fmin-function-alignment= which to me then suggests -flimit-function-alignment should not have an effect on it and even very small functions should be aligned. Richard. > + } > + > /* Handle a user-specified function alignment. > Note that we still need to align to DECL_ALIGN, as above, > because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all. */ > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)