> -----Original Message-----
> From: Jan Hubicka <[email protected]>
> Sent: 11 December 2025 20:56
> To: Prathamesh Kulkarni <[email protected]>
> Cc: [email protected]
> Subject: Re: [RFC] Enable time profile function reordering with
> AutoFDO
>
> External email: Use caution opening links or attachments
>
>
> > Hi Honza,
> > Thanks for the review, please find responses inline below.
> >
> > > I assume that the autofdo assigns timestamps to offile function
> > > instances, so this can also fit into function_instance
> > > datastructure, but I also suppose you want to keep it separate so
> it is optional?
> > function_instance does store timestamp, and uses it as a key into
> timestamp_info_map.
> > My intent of keeping a separate data-structure timestamp_info_map
> was
> > to sort 64-bit timestamps into ascending order, and then map the
> sorted timestamp values to (1..N), and assign the mapped value to
> node->tp_first_run.
> > So we don't need to up size of node->tp_first_run to 64bit (and I
> > guess we won't need the full 64-bit range here?)
>
> Normal time profiling works in a way that time is increased only when
> new function is executed, so 2^32 is the limit on number of executed
> functions in the program which is probably still large enough: if one
> does LTO with more than 2^32 symbols we will run out ot DECL_UIDs and
> other things, though it may be possible to link such a large program.
>
> At runtime, all computations and streaming happens in 64bit, so
> updating to_first_run into 64bit is also alternative, but I guess the
> compression is a reasonable approach until we will need to update
> other datastructures as well.
> > > Also does the fature work with normal partitioning? It should make
> > > those function with tp_first_run nonzero go into early partitions,
> > > but then we will likely get relatively lot of calls to functions
> > > with zero counts just because we lost track of them?
> > The results I reported have been with balanced partitioning. I am
> seeing roughly the same numbers on top of locality partitioning too.
>
> Great, happy that this works on both settings.
> > >
> > > I also wonder if offlining can behave meaningfully here. I.e.
> when
> > > function profile is non-zero just copy first run from the caller.
> > IIUC, the purpose of afdo_offline pass is to remove cross module
> > inlining from function_instances so AutoFDO annotation gets a better
> mapping from source locations to CFG ?
>
> Yes, old code assumed all relevant inlining to happen at early inline
> time via the afdo-inline path. All inlining that did not happen led
> to lost profile data. With LTO this becomes quite problematic since a
> lot of inlining is cross-module, so we need to take inline instances
> out and merge them with existing offline instances in order to get
> realistic profile before the link-time IPA-profile pass.
> >
> > Currently, the patch only assigns timestamps to functions whose
> outline copy is executed during train run (toplevel).
> > Would afdo offlining help for instance if a function is inlined into
> > all it's callers during train run (thus the outline copy has no
> timestamp info), but during AutoFDO, it may not get inlined into all
> it's callers ? Currently in this case, I guess we lose timestamp info
> (and thus time profile reordering for such a function).
> >
> > Currently the transform is gated explicitly on -fauto-profile -
> fprofile-reorder-functions (so users can opt-in instead of being
> enabled by default with -fauto-profile).
> > Should that be OK for gcc-16 ?
>
> I think we want to do that (also for gcc-16, since this is contained
> in autofdo). If we offline a function with a a non-zero counts in it,
> we may want to assign the offline copy first run timestamp derived
> from the outer function. (Probably something like
> timestamp+inline depth).
> > Does the patch look OK if bootstrap+test passes (with and without
> AutoFDO) ?
> > gcc/ChangeLog:
> > * auto-profile.cc: (string_table::filenames): New method.
> > (function_instance::timestamp_): New member.
> > (function_instance::timestamp): New accessor for timestamp_
> member.
> > (autofdo::timestamp_info_map): New std::map.
> > (function_instance::function_instance): Add argument timestamp
> and set
> > it to member timestamp_.
> > (function_instance::read_function_instance): Adjust prototype
> and read
> > timestamp.
> > (autofdo_source_profile::get_function_instance_by_decl): New
> argument
> > filename with default value NULL.
> > (autofdo_source_profile::read): Populate timestamp_info_map.
> > (afdo_annotate_cfg): Assign node->tp_first_run based on
> > timestamp_info_map and bail out of annotation if
> > param auto-profile-reorder-only is enabled.
> > * params.opt: New param auto-profile-reorder-only.
> > * ipa-locality-cloning.cc (partition_by_timestamps): New
> function.
> > (locality_partition_and_clone): Call partition_by_timestamps.
> > * varasm.cc (default_function_section): Bail out if -fauto-
> profile
> > -fprofile-reorder-functions is enabled and node->tp_first_run
> > 0.
> >
> > gcc/lto/ChangeLog:
> > * lto-partition.cc (create_partition): Move higher up in the
> file.
> > (partition_by_timestamps): New function.
> > (lto_balanced_map): Call partition_by_timestamps if -fauto-
> profile
> > -fprofile-reorder-functions is passed and noreroder is empty.
> OK
> > + /* perf timestamp associated with first execution of function.
> */
> Please add comment that tp_first_run is eventually computed using this
> value, but we need to watch overflows since it is 32bit.
> > + gcov_type timestamp_;
> > +/* Map from timestamp -> <name, tp_first_run>.
> > +
> > +The purpose of this map is to map 64-bit timestamp values to
> (1..N),
> > +sorted by ascending order of timestamps and assign that to
> > +node->tp_first_run, since we don't need the full 64-bit range. */
> Seems the comment is formatted wrongly.
> > +static std::map<gcov_type, std::pair<int, int>> timestamp_info_map;
> > +
> > /* Scaling factor for afdo data. Compared to normal profile
> > AFDO profile counts are much lower, depending on sampling
> > frequency. We scale data up to reudce effects of roundoff @@
> > -2523,6 +2543,7 @@
> autofdo_source_profile::offline_unrealized_inlines
> > ()
> > /* function instance profile format:
> >
> > ENTRY_COUNT: 8 bytes
> > + TIMESTAMP: 8 bytes (only for toplevel symbols)
> > NAME_INDEX: 4 bytes
> > NUM_POS_COUNTS: 4 bytes
> > NUM_CALLSITES: 4 byte
> > @@ -2549,14 +2570,17 @@
> > autofdo_source_profile::offline_unrealized_inlines ()
> >
> > function_instance *
> > function_instance::read_function_instance (function_instance_stack
> *stack,
> > - gcov_type head_count)
> > + gcov_type head_count, bool
> > + toplevel)
>
> There are also head count that is also streamed for toplevel instances
> that is handled by passing it around as a parameter:
>
> function_instance *s = function_instance::read_function_instance
> (
> &stack, gcov_read_counter ());
>
> I think "toplevel" flag is better, so please modify streaming of head
> count to work same way.
> > = new function_instance (name, afdo_string_table-
> >get_filename_idx (name),
> > - head_count);
> > + head_count, timestamp);
> We will need setter api anyway to handle update while offlining, so
> perhaps instead of adding extra parameter of the ctor, just set
> timestamp after contruction?
> > + /* timestamp_info_map is std::map with timestamp as key,
> > + so it's already sorted in ascending order wrt timestamps.
> > + This loop maps function with lowest timestamp to 1, and so on.
> > + In afdo_annotate_cfg, node->tp_first_run is then set to
> corresponding
> > + tp_first_run value. */
> > +
> > + int tp_first_run = 1;
> > + for (auto &p : timestamp_info_map)
> > + p.second.second = tp_first_run++;
> For time being I guess you can walk inline instances here recursively
> and assign tp_first_runs to all of those that contains non-zero
> counts.
> Then offlining only needs to merge the timestamps at a time it merges
> the profile.
> > diff --git a/gcc/ipa-locality-cloning.cc b/gcc/ipa-locality-
> cloning.cc
> > index 2684046bd2d..1653ea7b961 100644
> > --- a/gcc/ipa-locality-cloning.cc
> > +++ b/gcc/ipa-locality-cloning.cc
>
> The changes above are OK with the changes mentioned.
Thanks for the suggestions, I have tried to address them in the attached patch,
and will send the partitioning changes in a separate patch.
This patch propagates timestamp from toplevel function_instance to inlined
instances,
and picks the lesser value while merging. On testing the patch, I am seeing a
~4.1% improvement on a large internal workload,
and several more functions with non-zero tp_first_run values.
I didn't fully investigate the cause yet but IIUC, this would happen if
ipa-inline during AutoFDO didn't do all the inlining that happened during train
run ?
So the outlined callee now gets tp_first_run set, and is grouped together with
the (first) caller (while without propagating tp_first_run, the caller and
outlined callee will get placed apart) ?
Does the patch look OK ?
Signed-off-by: Prathamesh Kulkarni <[email protected]>
Thanks,
Prathamesh
>
> The changes below can still be a separate patch (once tp_first_run is
> set, we should get sane partitioning and ordering by existing balanced
> partitioning code).
>
> > @@ -994,6 +994,50 @@ locality_determine_static_order
> (auto_vec<locality_order *> *order)
> > return true;
> > }
> >
> > +/* Similar to lto-partition.cc:partition_by_timestamp.s */
> > +
> > +static locality_partition
> > +partition_by_timestamps (int64_t max_partition_size, int&
> > +npartitions)
> It is not called timestamps anymore, so perhaps
> parittion_by_tp_first_run.
> > @@ -1134,6 +1187,19 @@ lto_balanced_map (int n_lto_partitions, int
> > max_partition_size)
> >
> > original_total_size = total_size;
> >
> > + /* With -fauto-profile -fprofile-reorder-functions, place all
> symbols which
> > + are profiled together into as few partitions as possible. The
> rationale
> > + for doing this with AutoFDO is that the number of sampled
> functions is a
> > + fraction of total number of executed functions (unlike PGO
> where each
> > + executed function gets instrumented with time profile
> counter), and
> > + placing them together helps with code locality. */
> > +
> > + partition = NULL;
> > + if (flag_auto_profile
> > + && flag_profile_reorder_functions
> > + && noreorder.length () == 0)
> > + partition = partition_by_timestamps (max_partition_size,
> > + npartitions);
>
> I do not see why this is needed. Balanced parititioning starts by a
> fixed order of functions:
>
> order.qsort (tp_first_run_node_cmp);
>
> which will prioritize functions with tp_first_run and
> flag_profile_reodr_functions enabled (this flag is per-function so it
> needs to be checked on each of them)
>
> int
> tp_first_run_node_cmp (const void *pa, const void *pb) {
> const cgraph_node *a = *(const cgraph_node * const *) pa;
> const cgraph_node *b = *(const cgraph_node * const *) pb;
> unsigned int tp_first_run_a = a->tp_first_run;
> unsigned int tp_first_run_b = b->tp_first_run;
>
> if (!opt_for_fn (a->decl, flag_profile_reorder_functions)
> || a->no_reorder)
> tp_first_run_a = 0;
> if (!opt_for_fn (b->decl, flag_profile_reorder_functions)
> || b->no_reorder)
> tp_first_run_b = 0;
>
> if (tp_first_run_a == tp_first_run_b)
> return a->order - b->order;
>
> /* Functions with time profile must be before these without profile.
> */
> tp_first_run_a = (tp_first_run_a - 1) & INT_MAX;
> tp_first_run_b = (tp_first_run_b - 1) & INT_MAX;
>
> return tp_first_run_a - tp_first_run_b; }
>
> Once order is set, balanced partitioning assigns functions to
> partitiong in that order only trying to minimize the interface between
> the parts by looking for better cut points in a given range.
>
> So it should give similar or better order than what you get with first
> putting non-zero tp first run into partitions using separate
> partitioner.
> > +
> > + /* With -fauto-profile -fprofile-reorder-functions, give higher
> priority
> > + to time profile based reordering, and ensure the reordering
> isn't split
> > + by hot/cold partitioning. */
> > + if (flag_auto_profile
> > + && flag_profile_reorder_functions
> > + && cgraph_node::get (decl)->tp_first_run > 0)
> > + return NULL;
>
> I am also not sure why this would be needed. If you have cold
> function that is run early (such as startup initialization code) it
> still makes sense to place it away from code that is run often.
>
> Honza
Enable time profile function reordering with AutoFDO.
The patch enables time profile based reordering with AutoFDO with
-fauto-profile -fprofile-reorder-functions, by mapping timestamps obtained from
perf
into node->tp_first_run.
The rationale for doing this is:
(1) GCC already implements time-profile function reordering with PGO, the patch
enables
it with AutoFDO.
(2) While time profile ordering is primarly meant for optimizing startup time,
we've also observed good effects on code-locality for large internal workloads.
(3) Possibly useful for function reordering when accurate profile annotation is
hard with AutoFDO -- For eg, if branch samples are missing (due to absence of
LBR like structure).
On AutoFDO tools side, a corresponding patch extends gcov to emit 64-bit perf
timestamp that
records first execution of function, which loosely corresponds to PGO's
time_profile counter.
The timestamp is stored adjacent to head field in toplevel function info.
On GCC side, this patch makes the following changes:
(1) Changes to auto-profile pass:
The patch adds a new field timestamp to function_instance,
and populates it in read_function_instance.
It maintains a new timestamp_info_map from timestamp -> <name, tp_first_run>,
which maps timestamps sorted in ascending order to (1..N), so lowest ordered
timestamp is mapped to 1 and so on. The rationale for this is that
timestamps are 64-bit integers, and we don't need the full 64-bit range
for ordering by tp_first_run.
During annotation, the timestamp associated with function_instance is looked up
in timestamp_info_map, and corresponding mapped value is assigned
to node->tp_first_run.
Dhruv's sourcefile tracking patch already handles LTO privatized symbols.
The patch adds a workaround for mismatched/empty filenames, which should go away
when the issues with AutoFDO tools dwarf parsing are resolved.
(2) Param to disable profile driven opts.
The patch adds param auto-profile-reorder-only which only enables time-profile
reordering with
AutoFDO:
(a) Useful as a debugging aid to isolate regression to either function
reordering or profile driven opts.
(b) As a stopgap measure to avoid regressions with AutoFDO profile driven opts.
(c) Possibly useful for architectures which do not support branch sampling.
gcc/ChangeLog:
* auto-profile.cc: (string_table::filenames): New method.
(function_instance::timestamp_): New member.
(function_instance::timestamp): New accessor for timestamp_ member.
(function_instance::set_timestamp): New function.
(function_instance::prop_timestamp): Likewise.
(function_instance::prop_timestamp_1): Likewise.
(function_instance::function_instance): Initialize timestamp_ to 0.
(function_instance::read_function_instance): Adjust prototype by
replacing head_count with toplevel param with default value true, and
stream in head_count and timestamp values from gcov file.
(autofdo::timestamp_info_map): New std::map.
(autofdo_source_profile::get_function_instance_by_decl): New argument
filename with default value NULL.
(autofdo_source_profile::read): Populate timestamp_info_map and
propagate timestamp to inlined instances from toplevel function.
(afdo_annotate_cfg): Assign node->tp_first_run based on
timestamp_info_map and bail out of annotation if
param_auto_profile_reorder_only is enabled.
* params.opt: New param auto-profile-reorder-only.
Signed-off-by: Prathamesh Kulkarni <[email protected]>
diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 35874f465e5..2a55e4ffbaf 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -281,6 +281,8 @@ public:
/* Return cgraph node corresponding to given name index. */
cgraph_node *get_cgraph_node (int);
+
+ const string_vector& filenames () { return filenames_; }
private:
typedef std::map<const char *, unsigned, string_compare> string_index_map;
typedef std::map<const char *, auto_vec<unsigned>, string_compare>
@@ -348,8 +350,7 @@ public:
HEAD_COUNT. Recursively read callsites to create nested function_instances
too. STACK is used to track the recursive creation process. */
static function_instance *
- read_function_instance (function_instance_stack *stack,
- gcov_type head_count);
+ read_function_instance (function_instance_stack *stack, bool toplevel =
true);
/* Recursively deallocate all callsites (nested function_instances). */
~function_instance ();
@@ -373,6 +374,18 @@ public:
return head_count_;
}
+ gcov_type
+ timestamp () const
+ {
+ return timestamp_;
+ }
+
+ void set_timestamp (gcov_type timestamp) { timestamp_ = timestamp; }
+
+ /* Propagate timestamp from top-level function_instance to
+ inlined instances. */
+ void prop_timestamp ();
+
/* Traverse callsites of the current function_instance to find one at the
location of LINENO and callee name represented in DECL.
LOCATION should match LINENO and is used to output diagnostics. */
@@ -536,9 +549,10 @@ private:
function_instance (unsigned symbol_name, unsigned file_name,
gcov_type head_count)
: descriptor_ (file_name, symbol_name), total_count_ (0),
- head_count_ (head_count), removed_icall_target_ (false),
- realized_ (false), in_worklist_ (false), inlined_to_ (NULL),
- location_ (UNKNOWN_LOCATION), call_location_ (UNKNOWN_LOCATION)
+ head_count_ (head_count), timestamp_ (0),
+ removed_icall_target_ (false), realized_ (false), in_worklist_ (false),
+ inlined_to_ (NULL), location_ (UNKNOWN_LOCATION),
+ call_location_ (UNKNOWN_LOCATION)
{
}
@@ -554,6 +568,10 @@ private:
/* Entry BB's sample count. */
gcov_type head_count_;
+ /* perf timestamp associated with first execution of function, which is
+ used to compute node->tp_first_run. */
+ gcov_type timestamp_;
+
/* Map from callsite location to callee function_instance. */
callsite_map callsites;
@@ -581,6 +599,9 @@ private:
/* Turn inline instance to offline. */
static bool offline (function_instance *fn,
vec <function_instance *> &new_functions);
+
+ /* Helper routine for prop_timestamp. */
+ void prop_timestamp_1 (gcov_type timestamp);
};
/* Profile for all functions. */
@@ -601,7 +622,7 @@ public:
~autofdo_source_profile ();
/* For a given DECL, returns the top-level function_instance. */
- function_instance *get_function_instance_by_decl (tree decl) const;
+ function_instance *get_function_instance_by_decl (tree decl, const char * =
NULL) const;
/* For a given DESCRIPTOR, return the matching instance if found. */
function_instance *
@@ -681,6 +702,13 @@ static autofdo_source_profile *afdo_source_profile;
/* gcov_summary structure to store the profile_info. */
static gcov_summary *afdo_profile_info;
+/* Map from timestamp -> <name, tp_first_run>.
+
+ The purpose of this map is to map 64-bit timestamp values to (1..N) sorted
+ by ascending order of timestamps and assign that to node->tp_first_run,
+ since we don't need the full 64-bit range. */
+static std::map<gcov_type, int> timestamp_info_map;
+
/* Scaling factor for afdo data. Compared to normal profile
AFDO profile counts are much lower, depending on sampling
frequency. We scale data up to reduce effects of roundoff
@@ -1182,6 +1210,24 @@ function_instance::~function_instance ()
delete iter->second;
}
+/* Propagate timestamp TS of function_instance to inlined instances if it's
+ not already set. */
+
+void
+function_instance::prop_timestamp_1 (gcov_type ts)
+{
+ if (!timestamp () && total_count () > 0)
+ set_timestamp (ts);
+ for (auto it = callsites.begin (); it != callsites.end (); ++it)
+ it->second->prop_timestamp_1 (ts);
+}
+
+void
+function_instance::prop_timestamp (void)
+{
+ prop_timestamp_1 (timestamp ());
+}
+
/* Traverse callsites of the current function_instance to find one at the
location of LINENO and callee name represented in DECL. */
@@ -1244,6 +1290,10 @@ function_instance::merge (function_instance *other,
else if (head_count_ != -1)
head_count_ += other->head_count_;
+ /* While merging timestamps, set the one that occurs earlier. */
+ if (other->timestamp () < timestamp ())
+ set_timestamp (other->timestamp ());
+
bool changed = true;
while (changed)
@@ -2582,6 +2632,7 @@ autofdo_source_profile::offline_unrealized_inlines ()
/* function instance profile format:
ENTRY_COUNT: 8 bytes
+ TIMESTAMP: 8 bytes (only for toplevel symbols)
NAME_INDEX: 4 bytes
NUM_POS_COUNTS: 4 bytes
NUM_CALLSITES: 4 byte
@@ -2608,8 +2659,15 @@ autofdo_source_profile::offline_unrealized_inlines ()
function_instance *
function_instance::read_function_instance (function_instance_stack *stack,
- gcov_type head_count)
+ bool toplevel)
{
+ gcov_type_unsigned timestamp = 0;
+ gcov_type head_count = -1;
+ if (toplevel)
+ {
+ head_count = gcov_read_counter ();
+ timestamp = (gcov_type_unsigned) gcov_read_counter ();
+ }
unsigned name = gcov_read_unsigned ();
unsigned num_pos_counts = gcov_read_unsigned ();
unsigned num_callsites = gcov_read_unsigned ();
@@ -2617,6 +2675,8 @@ function_instance::read_function_instance
(function_instance_stack *stack,
= new function_instance (name,
afdo_string_table->get_filename_by_symbol (name),
head_count);
+ if (timestamp > 0)
+ s->set_timestamp (timestamp);
if (!stack->is_empty ())
s->set_inlined_to (stack->last ());
stack->safe_push (s);
@@ -2644,7 +2704,7 @@ function_instance::read_function_instance
(function_instance_stack *stack,
{
unsigned offset = gcov_read_unsigned ();
function_instance *callee_function_instance
- = read_function_instance (stack, -1);
+ = read_function_instance (stack, false);
s->callsites[std::make_pair (offset,
callee_function_instance->symbol_name ())]
= callee_function_instance;
@@ -2665,9 +2725,10 @@ autofdo_source_profile::~autofdo_source_profile ()
/* For a given DECL, returns the top-level function_instance. */
function_instance *
-autofdo_source_profile::get_function_instance_by_decl (tree decl) const
+autofdo_source_profile::get_function_instance_by_decl (tree decl, const char
*filename) const
{
- const char *filename = get_normalized_path (DECL_SOURCE_FILE (decl));
+ if (!filename)
+ filename = get_normalized_path (DECL_SOURCE_FILE (decl));
int index = afdo_string_table->get_index_by_decl (decl);
if (index == -1)
return NULL;
@@ -2898,8 +2959,7 @@ autofdo_source_profile::read ()
{
function_instance::function_instance_stack stack;
function_instance *s
- = function_instance::read_function_instance (&stack,
- gcov_read_counter ());
+ = function_instance::read_function_instance (&stack);
if (find_function_instance (s->get_descriptor ()) == nullptr)
add_function_instance (s);
@@ -2907,7 +2967,21 @@ autofdo_source_profile::read ()
fatal_error (UNKNOWN_LOCATION,
"auto-profile contains duplicated function instance %s",
afdo_string_table->get_symbol_name (s->symbol_name ()));
+
+ s->prop_timestamp ();
+ timestamp_info_map.insert({s->timestamp (), 0});
}
+
+ /* timestamp_info_map is std::map with timestamp as key,
+ so it's already sorted in ascending order wrt timestamps.
+ This loop maps function with lowest timestamp to 1, and so on.
+ In afdo_annotate_cfg, node->tp_first_run is then set to corresponding
+ tp_first_run value. */
+
+ int tp_first_run = 1;
+ for (auto &p : timestamp_info_map)
+ p.second = tp_first_run++;
+
int hot_frac = param_hot_bb_count_fraction;
/* Scale up the profile, but leave some bits in case some counts gets
bigger than sum_max eventually. */
@@ -4277,6 +4351,17 @@ afdo_annotate_cfg (void)
= afdo_source_profile->get_function_instance_by_decl (
current_function_decl);
+ /* FIXME: This is a workaround for sourcefile tracking, if afdo_string_table
+ ends up with empty filename or incorrect filename for the function and
+ should be removed once issues with sourcefile tracking get fixed. */
+ if (s == NULL)
+ for (unsigned i = 0; i < afdo_string_table->filenames ().length (); i++)
+ {
+ s = afdo_source_profile->get_function_instance_by_decl
(current_function_decl, afdo_string_table->filenames()[i]);
+ if (s)
+ break;
+ }
+
if (s == NULL)
{
if (dump_file)
@@ -4285,7 +4370,8 @@ afdo_annotate_cfg (void)
/* create_gcov only dumps symbols with some samples in them.
This means that we get nonempty zero_bbs only if some
nonzero counts in profile were not matched with statements. */
- if (!flag_profile_partial_training)
+ if (!flag_profile_partial_training
+ && !param_auto_profile_reorder_only)
{
FOR_ALL_BB_FN (bb, cfun)
if (bb->count.quality () == GUESSED_LOCAL)
@@ -4295,6 +4381,20 @@ afdo_annotate_cfg (void)
return;
}
+ if (auto it = timestamp_info_map.find (s->timestamp ());
+ it != timestamp_info_map.end ())
+ {
+ cgraph_node *node = cgraph_node::get (current_function_decl);
+ node->tp_first_run = it->second;
+
+ if (dump_file)
+ fprintf (dump_file, "Setting %s->tp_first_run to %d\n",
+ node->asm_name (), node->tp_first_run);
+ }
+
+ if (param_auto_profile_reorder_only)
+ return;
+
calculate_dominance_info (CDI_POST_DOMINATORS);
calculate_dominance_info (CDI_DOMINATORS);
loop_optimizer_init (0);
diff --git a/gcc/params.opt b/gcc/params.opt
index ad5ee887367..936c5a2a116 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -70,6 +70,10 @@ Enable asan detection of use-after-return bugs.
Common Joined UInteger Var(param_auto_profile_bbs) Init(1) IntegerRange(0, 1)
Param Optimization
Build basic block profile using auto profile.
+-param=auto-profile-reorder-only=
+Common Joined UInteger Var(param_auto_profile_reorder_only) Init(0)
IntegerRange(0, 1) Param Optimization
+Eanble only function reordering with auto-profile.
+
-param=cycle-accurate-model=
Common Joined UInteger Var(param_cycle_accurate_model) Init(1) IntegerRange(0,
1) Param Optimization
Whether the scheduling description is mostly a cycle-accurate model of the
target processor and is likely to be spill aggressively to fill any pipeline
bubbles.