On Sun, Oct 6, 2019 at 4:38 PM Jan Hubicka <[email protected]> wrote:
>
> > On 9/19/19 2:33 AM, Martin Liška wrote:
> > > Hi.
> > >
> > > Function reordering has been around for quite some time and a naive
> > > implementation was also part of my diploma thesis some time ago.
> > > Currently, the GCC can reorder function based on first execution, which
> > > happens with PGO and LTO of course. Known limitation is that the order
> > > is preserved only partially as various symbols go into different LTRANS
> > > partitions.
> > >
> > > There has been some research in the area and I would point out the
> > > Facebook paper
> > > ([1]) and Sony presentation ([2]). Based on that, I decided to make a new
> > > implementation
> > > in the GCC that does the same (in a proper way). First part of the
> > > enablement are patches
> > > to ld.bfd and ld.gold that come up with a new section .text.sorted, that
> > > is always sorted.
> > > There's a snippet from the modified default linker script:
> > Funny, I was doing this via linker scripts circa ~95, in fact that's why
> > we have -ffunction-sections :-) We started with scripts which
> > post-processed profiling data to create linker scripts for ELF systems.
> > We had something for HPUX/SOM as well, but I can't remember what
> > mechanism we used, it may have been a gross level sorting using the SOM
> > section sort key mechanism - it only had 128 or 256 keys with a notable
> > amount of them reserved.
> >
> > We had also built a linker with a basic level of interposition circa
> > 1993 and explored various approaches to reordering executables. I'd
> > just joined the group at the time and was responsible for wiring up
> > stuff on the OS side, but eventually got the "pleasure" of owning the
> > linker server. A lot of the C3 algorithmic stuff looks similar to what
> > we did.
>
> For reference I am attaching early LTO version from circa 2010 :)
> >
> > Anyway...
> >
> > I don't see anything objectionable in here. It's noted as an RFC. Are
> > you interested in pushing this forward for gcc-10?
>
> I think it is plan to get some form of code layout pass into GCC10. I
> will test Martin's version on Firefox and see if it have any effect
> here. It is generally quite hard to evaluate those changes (and it is
> reason why I did not moved forward with version form 2010 - more
> precisely it kept falling off the shelf for about a decade)
>
> If LLD has support for cgraph annotations and we support LLD, i think we
> should add support for that, too - it will be very useful to compare
> indvidiual implementations.
> I believe there is gold support (off tree?) for that too and something
> similar is also supported by other toolchains like AIX?
>
> One problem with the sections approach is that every section gets
> aligned to largest code alignment inside of the section. Since our
> alignments are (should be) often cache line based we get cca 30 bytes of
> waste for every function that is quite a lot.
>
> This is why we currently have way to order function when outputting them
> and use that with FDO (using Martin's first execution logic). This has
> drwarback of making the functions to flow in that order through late
> optimizations and RTL backend and thus we lose IPA-RA and some
> IP propagation (late pure/const/nothrow discovery).
But you can also fix that by the parallelization GSoC project approach,
decoupling things at RTL expansion IPA-wise and using output order
for the latter (or even do the fragments in GCC itself by refactoring the
output machinery to use function-local "strings", assembling the final
file later). Refactoring output to handle multiple output "files" at the
same time might also help getting rid of that early-LTO-debug copying
stuff (now that it seems that linker-plugin extensions for partially claiming
files will never happen...)
> So this approach has a drawback, too. It is why i was trying to push
> myself to get gcc to use gas fragments :)
>
> Anyway, all these issues can be sovled incementally - lets see how
> Maritn's patch work on Firefox and if we can get it tested elsewhere and
> start from that.
>
> I will take a look into the details.
>
> Honza
> >
> > jeff
> >
>
> Index: tree-pass.h
> ===================================================================
> *** tree-pass.h (revision 164689)
> --- tree-pass.h (working copy)
> *************** extern struct ipa_opt_pass_d pass_ipa_lt
> *** 467,472 ****
> --- 467,473 ----
> extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
> extern struct ipa_opt_pass_d pass_ipa_profile;
> extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
> + extern struct ipa_opt_pass_d pass_ipa_func_reorder;
>
> extern struct gimple_opt_pass pass_all_optimizations;
> extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
> Index: cgraphunit.c
> ===================================================================
> *** cgraphunit.c (revision 164689)
> --- cgraphunit.c (working copy)
> *************** cgraph_inline_p (struct cgraph_edge *e,
> *** 1517,1522 ****
> --- 1517,1529 ----
> return !e->inline_failed;
> }
>
> + static int
> + node_cmp (const void *pa, const void *pb)
> + {
> + const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
> + const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
> + return b->order - a->order;
> + }
>
>
> /* Expand all functions that must be output.
> *************** cgraph_expand_all_functions (void)
> *** 1546,1551 ****
> --- 1553,1561 ----
> if (order[i]->process)
> order[new_order_pos++] = order[i];
>
> + if (flag_ipa_reorder)
> + qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
> +
> for (i = new_order_pos - 1; i >= 0; i--)
> {
> node = order[i];
> Index: tree-ssa-ccp.c
> ===================================================================
> *** tree-ssa-ccp.c (revision 164689)
> --- tree-ssa-ccp.c (working copy)
> *************** ccp_fold_stmt (gimple_stmt_iterator *gsi
> *** 2307,2312 ****
> --- 2307,2324 ----
> changed = true;
> }
> }
> + if (TREE_CODE (gimple_call_fn (stmt)) == OBJ_TYPE_REF)
> + {
> + tree expr = OBJ_TYPE_REF_EXPR (gimple_call_fn (stmt));
> + expr = valueize_op (expr);
> + if (TREE_CODE (expr) == ADDR_EXPR
> + && TREE_CODE (TREE_OPERAND (expr, 0)) == FUNCTION_DECL)
> + {
> + gimple_call_set_fndecl (stmt, TREE_OPERAND (expr, 0));
> + debug_gimple_stmt (stmt);
> + changed = true;
> + }
> + }
>
> return changed;
> }
> Index: lto-cgraph.c
> ===================================================================
> *** lto-cgraph.c (revision 164689)
> --- lto-cgraph.c (working copy)
> *************** lto_output_node (struct lto_simple_outpu
> *** 466,471 ****
> --- 466,473 ----
>
> if (tag == LTO_cgraph_analyzed_node)
> {
> + lto_output_uleb128_stream (ob->main_stream,
> + node->order);
> lto_output_sleb128_stream (ob->main_stream,
>
> node->local.inline_summary.estimated_self_stack_size);
> lto_output_sleb128_stream (ob->main_stream,
> *************** input_overwrite_node (struct lto_file_de
> *** 927,932 ****
> --- 929,935 ----
> struct cgraph_node *node,
> enum LTO_cgraph_tags tag,
> struct bitpack_d *bp,
> + int order,
> unsigned int stack_size,
> unsigned int self_time,
> unsigned int time_inlining_benefit,
> *************** input_overwrite_node (struct lto_file_de
> *** 935,940 ****
> --- 938,944 ----
> enum ld_plugin_symbol_resolution resolution)
> {
> node->aux = (void *) tag;
> + node->order = order;
> node->local.inline_summary.estimated_self_stack_size = stack_size;
> node->local.inline_summary.self_time = self_time;
> node->local.inline_summary.time_inlining_benefit = time_inlining_benefit;
> *************** input_node (struct lto_file_decl_data *f
> *** 1027,1032 ****
> --- 1031,1037 ----
> unsigned long same_body_count = 0;
> int clone_ref;
> enum ld_plugin_symbol_resolution resolution;
> + int order = 0;
>
> clone_ref = lto_input_sleb128 (ib);
>
> *************** input_node (struct lto_file_decl_data *f
> *** 1045,1050 ****
> --- 1050,1056 ----
>
> if (tag == LTO_cgraph_analyzed_node)
> {
> + order = lto_input_uleb128 (ib);
> stack_size = lto_input_sleb128 (ib);
> self_size = lto_input_sleb128 (ib);
> size_inlining_benefit = lto_input_sleb128 (ib);
> *************** input_node (struct lto_file_decl_data *f
> *** 1066,1072 ****
>
> bp = lto_input_bitpack (ib);
> resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
> ! input_overwrite_node (file_data, node, tag, &bp, stack_size, self_time,
> time_inlining_benefit, self_size,
> size_inlining_benefit, resolution);
>
> --- 1072,1078 ----
>
> bp = lto_input_bitpack (ib);
> resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
> ! input_overwrite_node (file_data, node, tag, &bp, order, stack_size,
> self_time,
> time_inlining_benefit, self_size,
> size_inlining_benefit, resolution);
>
> Index: gimple-fold.c
> ===================================================================
> *** gimple-fold.c (revision 164689)
> --- gimple-fold.c (working copy)
> *************** gimple_fold_obj_type_ref (tree ref, tree
> *** 1465,1470 ****
> --- 1465,1480 ----
> tree known_binfo = known_type ? TYPE_BINFO (known_type) : NULL_TREE;
> tree binfo;
>
> + if (TREE_CODE (obj) == SSA_NAME)
> + {
> + gimple def_stmt = SSA_NAME_DEF_STMT (obj);
> + if (def_stmt
> + && gimple_code (def_stmt) == GIMPLE_ASSIGN
> + && (gimple_assign_single_p (def_stmt)
> + || (gimple_assign_unary_nop_p (def_stmt))))
> + obj = gimple_assign_rhs1 (def_stmt);
> + }
> +
> if (TREE_CODE (obj) == ADDR_EXPR)
> obj = TREE_OPERAND (obj, 0);
>
> *************** fold_gimple_call (gimple_stmt_iterator *
> *** 1513,1520 ****
> copying EH region info to the new node. Easier to just do it
> here where we can just smash the call operand. */
> callee = gimple_call_fn (stmt);
> ! if (TREE_CODE (callee) == OBJ_TYPE_REF
> ! && TREE_CODE (OBJ_TYPE_REF_OBJECT (callee)) == ADDR_EXPR)
> {
> tree t;
>
> --- 1523,1529 ----
> copying EH region info to the new node. Easier to just do it
> here where we can just smash the call operand. */
> callee = gimple_call_fn (stmt);
> ! if (TREE_CODE (callee) == OBJ_TYPE_REF)
> {
> tree t;
>
> Index: common.opt
> ===================================================================
> *** common.opt (revision 164689)
> --- common.opt (working copy)
> *************** fdse
> *** 1107,1112 ****
> --- 1107,1116 ----
> Common Var(flag_dse) Init(1) Optimization
> Use the RTL dead store elimination pass
>
> + freorder-functions
> + Common Report Var(flag_ipa_reorder) Init(1) Optimization
> + Reroder functions for better locality at execution time
> +
> freschedule-modulo-scheduled-loops
> Common Report Var(flag_resched_modulo_sched) Optimization
> Enable/Disable the traditional scheduling in loops that already passed
> modulo scheduling
> Index: ipa-reorder.c
> ===================================================================
> *** ipa-reorder.c (revision 0)
> --- ipa-reorder.c (revision 0)
> ***************
> *** 0 ****
> --- 1,567 ----
> + /* IPA function reordering pass.
> + Copyright (C) 2010 Free Software Foundation, Inc.
> + Contributed by Jan Hubicka <[email protected]>
> +
> + This file is part of GCC.
> +
> + GCC is free software; you can redistribute it and/or modify it under
> + the terms of the GNU General Public License as published by the Free
> + Software Foundation; either version 3, or (at your option) any later
> + version.
> +
> + GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> + WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> + for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with GCC; see the file COPYING3. If not see
> + <http://www.gnu.org/licenses/>. */
> +
> + #include "config.h"
> + #include "system.h"
> + #include "coretypes.h"
> + #include "tm.h"
> + #include "cgraph.h"
> + #include "tree-pass.h"
> + #include "timevar.h"
> + #include "gimple.h"
> + #include "ggc.h"
> + #include "flags.h"
> + #include "pointer-set.h"
> + #include "target.h"
> + #include "tree-iterator.h"
> + #include "fibheap.h"
> +
> + struct priority
> + {
> + gcov_type priority;
> + struct partition *src_partition;
> + struct partition *dest_partition;
> + fibnode_t node;
> + };
> +
> + typedef struct priority *priority_ptr;
> +
> + DEF_VEC_P(priority_ptr);
> + DEF_VEC_ALLOC_P(priority_ptr,heap);
> +
> + struct partition
> + {
> + int index;
> + VEC(cgraph_node_ptr, heap) *nodes;
> + VEC(priority_ptr, heap) *priorities;
> + VEC(priority_ptr, heap) *in_priorities;
> + };
> +
> + typedef struct partition *partition_ptr;
> +
> + DEF_VEC_P(partition_ptr);
> + DEF_VEC_ALLOC_P(partition_ptr,heap);
> +
> + static bool
> + cgraph_node_will_be_output_p (struct cgraph_node *node)
> + {
> + return (node->analyzed && !node->global.inlined_to);
> + }
> +
> + static void
> + account_callees (struct partition *part,
> + VEC (partition_ptr, heap) *partitions,
> + struct cgraph_node *node)
> + {
> + struct cgraph_edge *edge;
> + struct priority *p;
> + int i;
> + struct ipa_ref *ref;
> +
> + for (edge = node->callees; edge; edge = edge->next_callee)
> + if (edge->inline_failed)
> + {
> + if (!edge->callee->aux)
> + {
> + struct partition *dest = VEC_index (partition_ptr, partitions,
> + edge->callee->uid);
> + if (dest && dest != part)
> + {
> + p = XCNEW (struct priority);
> + edge->callee->aux = p;
> + p->src_partition = part;
> + p->dest_partition = dest;
> + VEC_safe_push (priority_ptr, heap, part->priorities, p);
> + VEC_safe_push (priority_ptr, heap, dest->in_priorities, p);
> + }
> + else
> + continue;
> + }
> + else
> + p = (struct priority *)edge->callee->aux;
> + if (edge->count)
> + p->priority += edge->count + 1;
> + else
> + {
> + int mul = 1;
> + switch (edge->caller->frequency)
> + {
> + case NODE_FREQUENCY_UNLIKELY_EXECUTED: mul = 0; break;
> + case NODE_FREQUENCY_EXECUTED_ONCE: mul = 1; break;
> + case NODE_FREQUENCY_NORMAL: mul = 10; break;
> + case NODE_FREQUENCY_HOT: mul = 1000; break;
> + }
> + p->priority += edge->frequency * mul + 1;
> + }
> + }
> + else
> + account_callees (part, partitions, edge->callee);
> +
> + /* Account taking address of a function as possible call with a low
> priority. */
> + for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
> + if (ref->refered_type == IPA_REF_CGRAPH)
> + {
> + struct cgraph_node *node2 = ipa_ref_node (ref);
> + if (!node2->aux)
> + {
> + struct partition *dest = VEC_index (partition_ptr, partitions,
> + node2->uid);
> + if (dest && dest != part)
> + {
> + p = XCNEW (struct priority);
> + node2->aux = p;
> + p->src_partition = part;
> + p->dest_partition = dest;
> + VEC_safe_push (priority_ptr, heap, part->priorities, p);
> + VEC_safe_push (priority_ptr, heap, dest->in_priorities, p);
> + }
> + else
> + continue;
> + }
> + else
> + p = (struct priority *)node2->aux;
> + p->priority++;
> + }
> + }
> +
> + static void
> + clear_callees (struct cgraph_node *node)
> + {
> + struct cgraph_edge *edge;
> + int i;
> + struct ipa_ref *ref;
> +
> + for (edge = node->callees; edge; edge = edge->next_callee)
> + if (edge->inline_failed)
> + edge->callee->aux = NULL;
> + else
> + clear_callees (edge->callee);
> + for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
> + if (ref->refered_type == IPA_REF_CGRAPH)
> + ipa_ref_node (ref)->aux = NULL;
> + }
> +
> + /* Compare two priority vector entries via indexes of destination
> + partitions. */
> +
> + static int
> + priority_cmp (const void *pa, const void *pb)
> + {
> + const struct priority *a = *(const struct priority * const *) pa;
> + const struct priority *b = *(const struct priority * const *) pb;
> + /* Priorities pointing to dead partitions are removed lazily,
> + so just avoid referencing dead memory. */
> + if (!a->priority)
> + return ! b->priority ? 0 : -1;
> + if (!b->priority)
> + return 1;
> + return a->dest_partition->index - b->dest_partition->index;
> + }
> +
> + /* Compare two priority vector entries via indexes of destination
> + partitions. */
> +
> + static int
> + in_priority_cmp (const void *pa, const void *pb)
> + {
> + const struct priority *a = *(const struct priority * const *) pa;
> + const struct priority *b = *(const struct priority * const *) pb;
> + /* Priorities pointing to dead partitions are removed lazily,
> + so just avoid referencing dead memory. */
> + if (!a->priority)
> + return ! b->priority ? 0 : -1;
> + if (!b->priority)
> + return 1;
> + return a->src_partition->index - b->src_partition->index;
> + }
> +
> + static void
> + delete_priority (fibheap_t heap, struct priority *p)
> + {
> + p->priority = 0;
> + p->src_partition = p->dest_partition = NULL;
> + if (p->node)
> + fibheap_delete_node (heap, p->node);
> + p->node = NULL;
> + }
> +
> + static void
> + merge_partitions (VEC (partition_ptr, heap) *partitions,
> + fibheap_t heap,
> + struct partition *part1,
> + struct partition *part2)
> + {
> + VEC (priority_ptr, heap) *priorities = NULL;
> + VEC (priority_ptr, heap) *in_priorities = NULL;
> + unsigned int i1 = 0, i2 = 0;
> + struct cgraph_node *node;
> + int i;
> +
> + /* We keep priority lists ordered by indexes of destination partitions
> + to allow effective merging. */
> + qsort (VEC_address (priority_ptr, part1->priorities),
> + VEC_length (priority_ptr, part1->priorities),
> + sizeof (priority_ptr),
> + priority_cmp);
> + qsort (VEC_address (priority_ptr, part2->priorities),
> + VEC_length (priority_ptr, part2->priorities),
> + sizeof (priority_ptr),
> + priority_cmp);
> +
> + /* Merge priority lists, prune out references of partition to itself.
> + Assume priority lists are ordered by indexes of destination partitions
> + and keep them so. */
> + while (1)
> + {
> + struct priority *p1, *p2;
> +
> + while (i1 < VEC_length (priority_ptr, part1->priorities)
> + && !VEC_index (priority_ptr, part1->priorities, i1)->priority)
> + i1++;
> + while (i2 < VEC_length (priority_ptr, part2->priorities)
> + && !VEC_index (priority_ptr, part2->priorities, i2)->priority)
> + i2++;
> +
> + if (i1 == VEC_length (priority_ptr, part1->priorities)
> + && i2 == VEC_length (priority_ptr, part2->priorities))
> + break;
> +
> + if (i1 < VEC_length (priority_ptr, part1->priorities)
> + && (i2 == VEC_length (priority_ptr, part2->priorities)
> + || (VEC_index (priority_ptr, part1->priorities,
> + i1)->dest_partition->index
> + < VEC_index (priority_ptr, part2->priorities,
> + i2)->dest_partition->index)))
> + {
> + p1 = VEC_index (priority_ptr, part1->priorities, i1);
> + if (p1->dest_partition != part2)
> + VEC_safe_push (priority_ptr, heap, priorities, p1);
> + else
> + delete_priority (heap, p1);
> + i1++;
> + }
> + else if (i2 < VEC_length (priority_ptr, part2->priorities)
> + && (i1 == VEC_length (priority_ptr, part1->priorities)
> + || (VEC_index (priority_ptr, part1->priorities,
> + i1)->dest_partition->index
> + > VEC_index (priority_ptr, part2->priorities,
> + i2)->dest_partition->index)))
> + {
> + p2 = VEC_index (priority_ptr, part2->priorities, i2);
> + if (p2->dest_partition != part1)
> + {
> + VEC_safe_push (priority_ptr, heap, priorities, p2);
> + p2->src_partition = part1;
> + }
> + else
> + delete_priority (heap, p2);
> + i2++;
> + }
> + else
> + {
> + p1 = VEC_index (priority_ptr, part1->priorities, i1);
> + p2 = VEC_index (priority_ptr, part2->priorities, i2);
> + p1->priority += p2->priority;
> + fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority);
> + delete_priority (heap, p2);
> + VEC_safe_push (priority_ptr, heap, priorities, p1);
> + i1++;
> + i2++;
> + }
> + }
> + VEC_free (priority_ptr, heap, part1->priorities);
> + part1->priorities = priorities;
> +
> + qsort (VEC_address (priority_ptr, part1->in_priorities),
> + VEC_length (priority_ptr, part1->in_priorities),
> + sizeof (priority_ptr),
> + in_priority_cmp);
> + qsort (VEC_address (priority_ptr, part2->in_priorities),
> + VEC_length (priority_ptr, part2->in_priorities),
> + sizeof (priority_ptr),
> + in_priority_cmp);
> + i1 = 0;
> + i2 = 0;
> + while (1)
> + {
> + struct priority *p1, *p2;
> + while (i1 < VEC_length (priority_ptr, part1->in_priorities)
> + && !VEC_index (priority_ptr, part1->in_priorities, i1)->priority)
> + i1++;
> + while (i2 < VEC_length (priority_ptr, part2->in_priorities)
> + && !VEC_index (priority_ptr, part2->in_priorities, i2)->priority)
> + i2++;
> +
> + if (i1 == VEC_length (priority_ptr, part1->in_priorities)
> + && i2 == VEC_length (priority_ptr, part2->in_priorities))
> + break;
> +
> + if (i1 < VEC_length (priority_ptr, part1->in_priorities)
> + && (i2 == VEC_length (priority_ptr, part2->in_priorities)
> + || (VEC_index (priority_ptr, part1->in_priorities,
> + i1)->src_partition->index
> + < VEC_index (priority_ptr, part2->in_priorities,
> + i2)->src_partition->index)))
> + {
> + p1 = VEC_index (priority_ptr, part1->in_priorities, i1);
> + if (p1->src_partition != part2)
> + VEC_safe_push (priority_ptr, heap, in_priorities, p1);
> + else
> + delete_priority (heap, p1);
> + i1++;
> + }
> + else if (i2 < VEC_length (priority_ptr, part2->in_priorities)
> + && (i1 == VEC_length (priority_ptr, part1->in_priorities)
> + || (VEC_index (priority_ptr, part1->in_priorities,
> + i1)->src_partition->index
> + > VEC_index (priority_ptr, part2->in_priorities,
> + i2)->src_partition->index)))
> + {
> + p2 = VEC_index (priority_ptr, part2->in_priorities, i2);
> + if (p2->src_partition != part1)
> + {
> + VEC_safe_push (priority_ptr, heap, in_priorities, p2);
> + p2->dest_partition = part1;
> + }
> + else
> + delete_priority (heap, p2);
> + i2++;
> + }
> + else
> + {
> + p1 = VEC_index (priority_ptr, part1->in_priorities, i1);
> + p2 = VEC_index (priority_ptr, part2->in_priorities, i2);
> + p1->priority += p2->priority;
> + fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority);
> + delete_priority (heap, p2);
> + VEC_safe_push (priority_ptr, heap, in_priorities, p1);
> + i1++;
> + i2++;
> + }
> + }
> + VEC_free (priority_ptr, heap, part1->in_priorities);
> + part1->in_priorities = in_priorities;
> +
> + for (i = 0; VEC_iterate (cgraph_node_ptr, part2->nodes, i, node); i++)
> + VEC_safe_push (cgraph_node_ptr, heap, part1->nodes, node);
> +
> + VEC_replace (partition_ptr, partitions, part2->index, NULL);
> + VEC_free (priority_ptr, heap, part2->priorities);
> + VEC_free (priority_ptr, heap, part2->in_priorities);
> + VEC_free (cgraph_node_ptr, heap, part2->nodes);
> + free (part2);
> + }
> +
> + static void
> + dump_partition (struct partition *part)
> + {
> + int i;
> + struct cgraph_node *node;
> + struct priority *p;
> + fprintf (dump_file, " Partition %i:", part->index);
> + for (i = 0; VEC_iterate (cgraph_node_ptr, part->nodes, i, node); i++)
> + fprintf (dump_file, " %s/%i", cgraph_node_name (node), node->uid);
> + fprintf (dump_file, "\n priorities:");
> + for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++)
> + if (p->priority)
> + {
> + gcc_assert (p->src_partition == part);
> + gcc_assert (p->dest_partition != part);
> + fprintf (dump_file, " %i:%i", p->dest_partition->index,
> + (int)p->priority);
> + }
> + fprintf (dump_file, "\n");
> + }
> +
> + static unsigned int
> + ipa_func_reorder (void)
> + {
> + struct cgraph_node *node;
> + VEC (partition_ptr, heap) *partitions = VEC_alloc (partition_ptr, heap,
> cgraph_max_uid);
> + int i;
> + struct partition *part, *first_part = NULL;
> + int freq;
> + fibheap_t heap;
> +
> + if (!cgraph_max_uid)
> + return 0;
> +
> + heap = fibheap_new ();
> + VEC_safe_grow_cleared (partition_ptr, heap, partitions, cgraph_max_uid);
> +
> + for (node = cgraph_nodes; node; node = node->next)
> + if (cgraph_node_will_be_output_p (node))
> + {
> + part = XCNEW (struct partition);
> + part->index = node->uid;
> + VEC_safe_push (cgraph_node_ptr, heap, part->nodes, node);
> + VEC_replace (partition_ptr, partitions, node->uid, part);
> + }
> +
> + if (dump_file)
> + fprintf (dump_file, "\n\nCreating partitions\n");
> + for (node = cgraph_nodes; node; node = node->next)
> + if (cgraph_node_will_be_output_p (node))
> + {
> + part = VEC_index (partition_ptr, partitions, node->uid);
> + account_callees (part, partitions, node);
> + clear_callees (node);
> + if (dump_file)
> + dump_partition (part);
> + }
> + for (node = cgraph_nodes; node; node = node->next)
> + if (cgraph_node_will_be_output_p (node))
> + {
> + struct priority *p;
> + part = VEC_index (partition_ptr, partitions, node->uid);
> +
> + for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++)
> + p->node = fibheap_insert (heap, INT_MAX - p->priority, p);
> + if (dump_file)
> + dump_partition (part);
> + }
> +
> + if (dump_file)
> + fprintf (dump_file, "\n\nMerging by priorities\n");
> + while (!fibheap_empty (heap))
> + {
> + struct priority *p = (struct priority *)fibheap_extract_min (heap);
> + struct partition *part = p->src_partition;
> + p->node = NULL;
> + if (dump_file)
> + {
> + fprintf (dump_file,
> + "Concatenating partitions %i and %i, priority %i\n",
> + p->src_partition->index,
> + p->dest_partition->index,
> + (int)p->priority);
> + if (dump_file)
> + dump_partition (p->src_partition);
> + if (dump_file)
> + dump_partition (p->dest_partition);
> + }
> + merge_partitions (partitions, heap, p->src_partition,
> p->dest_partition);
> + if (dump_file)
> + dump_partition (part);
> + }
> +
> + /* We ran out of calls to merge. Try to arrange remaining partitions
> + approximately in execution order: static constructors first, followed
> + by partition containing function main ()
> + followed by hot sections of the program. */
> + if (dump_file)
> + fprintf (dump_file, "\n\nLooking for static constructors\n");
> + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> + if (part && part != first_part
> + && DECL_STATIC_CONSTRUCTOR (VEC_index (cgraph_node_ptr,
> + part->nodes, 0)->decl))
> + {
> + if (dump_file)
> + dump_partition (part);
> + if (!first_part)
> + first_part = part;
> + else
> + merge_partitions (partitions, heap, first_part, part);
> + }
> + if (dump_file)
> + fprintf (dump_file, "\n\nLooking for main\n");
> + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> + if (part && part != first_part
> + && MAIN_NAME_P (DECL_NAME (VEC_index (cgraph_node_ptr,
> + part->nodes, 0)->decl)))
> + {
> + if (dump_file)
> + dump_partition (part);
> + if (!first_part)
> + first_part = part;
> + else
> + merge_partitions (partitions, heap, first_part, part);
> + }
> + if (dump_file)
> + fprintf (dump_file, "\n\nMerging by frequency\n");
> + for (freq = NODE_FREQUENCY_HOT; freq >= NODE_FREQUENCY_UNLIKELY_EXECUTED;
> + freq--)
> + {
> + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> + if (part && part != first_part
> + && VEC_index (cgraph_node_ptr, part->nodes, 0)->frequency == freq)
> + {
> + if (dump_file)
> + dump_partition (part);
> + if (!first_part)
> + first_part = part;
> + else
> + merge_partitions (partitions, heap, first_part, part);
> + }
> + }
> + for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
> + gcc_assert (!part || part == first_part);
> +
> + fibheap_delete (heap);
> + if (!first_part)
> + return 0;
> + if (dump_file)
> + {
> + fprintf (dump_file, "\n\nFinal order\n");
> + dump_partition (first_part);
> + }
> +
> + /* Store the resulting order and kill the single remaining partition. */
> + for (i = 0; VEC_iterate (cgraph_node_ptr, first_part->nodes, i, node);
> i++)
> + node->order = i;
> + VEC_free (priority_ptr, heap, first_part->priorities);
> + VEC_free (cgraph_node_ptr, heap, first_part->nodes);
> + free (first_part);
> + return 0;
> + }
> +
> + static bool
> + gate_ipa_func_reorder (void)
> + {
> + return optimize && flag_ipa_reorder && flag_toplevel_reorder;
> + }
> +
> + struct ipa_opt_pass_d pass_ipa_func_reorder =
> + {
> + {
> + IPA_PASS,
> + "reorder", /* name */
> + gate_ipa_func_reorder, /* gate */
> + ipa_func_reorder, /* execute */
> + NULL, /* sub */
> + NULL, /* next */
> + 0, /* static_pass_number */
> + TV_CGRAPHOPT, /* tv_id */
> + 0, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + 0 /* todo_flags_finish */
> + },
> + NULL, /* generate_summary */
> + NULL, /* write_summary */
> + NULL, /* read_summary */
> + NULL, /* write_optimization_summary
> */
> + NULL, /* read_optimization_summary
> */
> + NULL, /* stmt_fixup */
> + 0, /* TODOs */
> + NULL, /* function_transform */
> + NULL /* variable_transform */
> + };
> Index: Makefile.in
> ===================================================================
> *** Makefile.in (revision 164689)
> --- Makefile.in (working copy)
> *************** OBJS-archive = \
> *** 1461,1466 ****
> --- 1461,1467 ----
> ipa-type-escape.o \
> ipa-utils.o \
> ipa.o \
> + ipa-reorder.o \
> matrix-reorg.o \
> prefix.o \
> tree-inline.o \
> *************** varpool.o : varpool.c $(CONFIG_H) $(SYST
> *** 3012,3017 ****
> --- 3013,3020 ----
> $(TREE_FLOW_H) gt-varpool.h
> ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) \
> $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) pointer-set.h
> + ipa-reorder.o : ipa-reorder.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
> + $(CGRAPH_H) $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H)
> pointer-set.h
> ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
> langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H)
> $(DIAGNOSTIC_H) \
> $(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
> Index: passes.c
> ===================================================================
> *** passes.c (revision 164689)
> --- passes.c (working copy)
> *************** init_optimization_passes (void)
> *** 818,823 ****
> --- 818,824 ----
> NEXT_PASS (pass_ipa_type_escape);
> NEXT_PASS (pass_ipa_pta);
> NEXT_PASS (pass_ipa_struct_reorg);
> + NEXT_PASS (pass_ipa_func_reorder);
> *p = NULL;
>
> p = &all_lto_gen_passes;