> On 9/19/19 2:33 AM, Martin Liška wrote:
> > Hi.
> > 
> > Function reordering has been around for quite some time and a naive
> > implementation was also part of my diploma thesis some time ago.
> > Currently, the GCC can reorder function based on first execution, which
> > happens with PGO and LTO of course. Known limitation is that the order
> > is preserved only partially as various symbols go into different LTRANS 
> > partitions.
> > 
> > There has been some research in the area and I would point out the Facebook 
> > paper
> > ([1]) and Sony presentation ([2]). Based on that, I decided to make a new 
> > implementation
> > in the GCC that does the same (in a proper way). First part of the 
> > enablement are patches
> > to ld.bfd and ld.gold that come up with a new section .text.sorted, that is 
> > always sorted.
> > There's a snippet from the modified default linker script:
> Funny, I was doing this via linker scripts circa ~95, in fact that's why
> we have -ffunction-sections :-)   We started with scripts which
> post-processed profiling data to create linker scripts for ELF systems.
>  We had something for HPUX/SOM as well, but I can't remember what
> mechanism we used, it may have been a gross level sorting using the SOM
> section sort key mechanism - it only had 128 or 256 keys with a notable
> amount of them reserved.
> 
> We had also built a linker with a basic level of interposition circa
> 1993 and explored various approaches to reordering executables.  I'd
> just joined the group at the time and was responsible for wiring up
> stuff on the OS side, but eventually got the "pleasure" of owning the
> linker server.  A lot of the C3 algorithmic stuff looks similar to what
> we did.

For reference I am attaching early LTO version from circa 2010 :)
> 
> Anyway...
> 
> I don't see anything objectionable in here.  It's noted as an RFC.  Are
> you interested in pushing this forward for gcc-10?

I think it is plan to get some form of code layout pass into GCC10.  I
will test Martin's version on Firefox and see if it have any effect
here. It is generally quite hard to evaluate those changes (and it is
reason why I did not moved forward with version form 2010 - more
precisely it kept falling off the shelf for about a decade)

If LLD has support for cgraph annotations and we support LLD, i think we
should add support for that, too - it will be very useful to compare
indvidiual implementations.
I believe there is gold support (off tree?) for that too and something
similar is also supported by other toolchains like AIX?

One problem with the sections approach is that every section gets
aligned to largest code alignment inside of the section. Since our
alignments are (should be) often cache line based we get cca 30 bytes of
waste for every function that is quite a lot.

This is why we currently have way to order function when outputting them
and use that with FDO (using Martin's first execution logic). This has
drwarback of making the functions to flow in that order through late
optimizations and RTL backend and thus we lose IPA-RA and some 
IP propagation (late pure/const/nothrow discovery).
So this approach has a drawback, too. It is why i was trying to push
myself to get gcc to use gas fragments :)

Anyway, all these issues can be sovled incementally - lets see how
Maritn's patch work on Firefox and if we can get it tested elsewhere and
start from that.

I will take a look into the details.

Honza
> 
> jeff
> 

Index: tree-pass.h
===================================================================
*** tree-pass.h (revision 164689)
--- tree-pass.h (working copy)
*************** extern struct ipa_opt_pass_d pass_ipa_lt
*** 467,472 ****
--- 467,473 ----
  extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
  extern struct ipa_opt_pass_d pass_ipa_profile;
  extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
+ extern struct ipa_opt_pass_d pass_ipa_func_reorder;
  
  extern struct gimple_opt_pass pass_all_optimizations;
  extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
Index: cgraphunit.c
===================================================================
*** cgraphunit.c        (revision 164689)
--- cgraphunit.c        (working copy)
*************** cgraph_inline_p (struct cgraph_edge *e,
*** 1517,1522 ****
--- 1517,1529 ----
    return !e->inline_failed;
  }
  
+ static int
+ node_cmp (const void *pa, const void *pb)
+ {
+   const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
+   const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
+   return b->order - a->order;
+ }
  
  
  /* Expand all functions that must be output.
*************** cgraph_expand_all_functions (void)
*** 1546,1551 ****
--- 1553,1561 ----
      if (order[i]->process)
        order[new_order_pos++] = order[i];
  
+   if (flag_ipa_reorder)
+     qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
+ 
    for (i = new_order_pos - 1; i >= 0; i--)
      {
        node = order[i];
Index: tree-ssa-ccp.c
===================================================================
*** tree-ssa-ccp.c      (revision 164689)
--- tree-ssa-ccp.c      (working copy)
*************** ccp_fold_stmt (gimple_stmt_iterator *gsi
*** 2307,2312 ****
--- 2307,2324 ----
                changed = true;
              }
          }
+       if (TREE_CODE (gimple_call_fn (stmt)) == OBJ_TYPE_REF)
+         {
+           tree expr = OBJ_TYPE_REF_EXPR (gimple_call_fn (stmt));
+           expr = valueize_op (expr);
+           if (TREE_CODE (expr) == ADDR_EXPR
+               && TREE_CODE (TREE_OPERAND (expr, 0)) == FUNCTION_DECL)
+            {
+              gimple_call_set_fndecl (stmt, TREE_OPERAND (expr, 0));
+              debug_gimple_stmt (stmt);
+              changed = true;
+            }
+         }
  
        return changed;
        }
Index: lto-cgraph.c
===================================================================
*** lto-cgraph.c        (revision 164689)
--- lto-cgraph.c        (working copy)
*************** lto_output_node (struct lto_simple_outpu
*** 466,471 ****
--- 466,473 ----
  
    if (tag == LTO_cgraph_analyzed_node)
      {
+       lto_output_uleb128_stream (ob->main_stream,
+                                node->order);
        lto_output_sleb128_stream (ob->main_stream,
                                 
node->local.inline_summary.estimated_self_stack_size);
        lto_output_sleb128_stream (ob->main_stream,
*************** input_overwrite_node (struct lto_file_de
*** 927,932 ****
--- 929,935 ----
                      struct cgraph_node *node,
                      enum LTO_cgraph_tags tag,
                      struct bitpack_d *bp,
+                     int order,
                      unsigned int stack_size,
                      unsigned int self_time,
                      unsigned int time_inlining_benefit,
*************** input_overwrite_node (struct lto_file_de
*** 935,940 ****
--- 938,944 ----
                      enum ld_plugin_symbol_resolution resolution)
  {
    node->aux = (void *) tag;
+   node->order = order;
    node->local.inline_summary.estimated_self_stack_size = stack_size;
    node->local.inline_summary.self_time = self_time;
    node->local.inline_summary.time_inlining_benefit = time_inlining_benefit;
*************** input_node (struct lto_file_decl_data *f
*** 1027,1032 ****
--- 1031,1037 ----
    unsigned long same_body_count = 0;
    int clone_ref;
    enum ld_plugin_symbol_resolution resolution;
+   int order = 0;
  
    clone_ref = lto_input_sleb128 (ib);
  
*************** input_node (struct lto_file_decl_data *f
*** 1045,1050 ****
--- 1050,1056 ----
  
    if (tag == LTO_cgraph_analyzed_node)
      {
+       order = lto_input_uleb128 (ib);
        stack_size = lto_input_sleb128 (ib);
        self_size = lto_input_sleb128 (ib);
        size_inlining_benefit = lto_input_sleb128 (ib);
*************** input_node (struct lto_file_decl_data *f
*** 1066,1072 ****
  
    bp = lto_input_bitpack (ib);
    resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
!   input_overwrite_node (file_data, node, tag, &bp, stack_size, self_time,
                        time_inlining_benefit, self_size,
                        size_inlining_benefit, resolution);
  
--- 1072,1078 ----
  
    bp = lto_input_bitpack (ib);
    resolution = (enum ld_plugin_symbol_resolution)lto_input_uleb128 (ib);
!   input_overwrite_node (file_data, node, tag, &bp, order, stack_size, 
self_time,
                        time_inlining_benefit, self_size,
                        size_inlining_benefit, resolution);
  
Index: gimple-fold.c
===================================================================
*** gimple-fold.c       (revision 164689)
--- gimple-fold.c       (working copy)
*************** gimple_fold_obj_type_ref (tree ref, tree
*** 1465,1470 ****
--- 1465,1480 ----
    tree known_binfo = known_type ? TYPE_BINFO (known_type) : NULL_TREE;
    tree binfo;
  
+   if (TREE_CODE (obj) == SSA_NAME)
+     {
+       gimple def_stmt = SSA_NAME_DEF_STMT (obj);
+       if (def_stmt
+         && gimple_code (def_stmt) == GIMPLE_ASSIGN
+         && (gimple_assign_single_p (def_stmt)
+             || (gimple_assign_unary_nop_p (def_stmt))))
+       obj = gimple_assign_rhs1 (def_stmt);
+     }
+ 
    if (TREE_CODE (obj) == ADDR_EXPR)
      obj = TREE_OPERAND (obj, 0);
  
*************** fold_gimple_call (gimple_stmt_iterator *
*** 1513,1520 ****
           copying EH region info to the new node.  Easier to just do it
           here where we can just smash the call operand.  */
        callee = gimple_call_fn (stmt);
!       if (TREE_CODE (callee) == OBJ_TYPE_REF
!           && TREE_CODE (OBJ_TYPE_REF_OBJECT (callee)) == ADDR_EXPR)
          {
            tree t;
  
--- 1523,1529 ----
           copying EH region info to the new node.  Easier to just do it
           here where we can just smash the call operand.  */
        callee = gimple_call_fn (stmt);
!       if (TREE_CODE (callee) == OBJ_TYPE_REF)
          {
            tree t;
  
Index: common.opt
===================================================================
*** common.opt  (revision 164689)
--- common.opt  (working copy)
*************** fdse
*** 1107,1112 ****
--- 1107,1116 ----
  Common Var(flag_dse) Init(1) Optimization
  Use the RTL dead store elimination pass
  
+ freorder-functions
+ Common Report Var(flag_ipa_reorder) Init(1) Optimization
+ Reroder functions for better locality at execution time
+ 
  freschedule-modulo-scheduled-loops
  Common Report Var(flag_resched_modulo_sched) Optimization
  Enable/Disable the traditional scheduling in loops that already passed modulo 
scheduling
Index: ipa-reorder.c
===================================================================
*** ipa-reorder.c       (revision 0)
--- ipa-reorder.c       (revision 0)
***************
*** 0 ****
--- 1,567 ----
+ /* IPA function reordering pass.
+    Copyright (C) 2010 Free Software Foundation, Inc.
+    Contributed by Jan Hubicka  <j...@suse.cz>
+ 
+ This file is part of GCC.
+ 
+ GCC is free software; you can redistribute it and/or modify it under
+ the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 3, or (at your option) any later
+ version.
+ 
+ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+ 
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING3.  If not see
+ <http://www.gnu.org/licenses/>.  */
+ 
+ #include "config.h"
+ #include "system.h"
+ #include "coretypes.h"
+ #include "tm.h"
+ #include "cgraph.h"
+ #include "tree-pass.h"
+ #include "timevar.h"
+ #include "gimple.h"
+ #include "ggc.h"
+ #include "flags.h"
+ #include "pointer-set.h"
+ #include "target.h"
+ #include "tree-iterator.h"
+ #include "fibheap.h"
+ 
+ struct priority
+ {
+   gcov_type priority;
+   struct partition *src_partition;
+   struct partition *dest_partition;
+   fibnode_t node;
+ };
+ 
+ typedef struct priority *priority_ptr;
+ 
+ DEF_VEC_P(priority_ptr);
+ DEF_VEC_ALLOC_P(priority_ptr,heap);
+ 
+ struct partition
+ {
+   int index;
+   VEC(cgraph_node_ptr, heap) *nodes;
+   VEC(priority_ptr, heap) *priorities;
+   VEC(priority_ptr, heap) *in_priorities;
+ };
+ 
+ typedef struct partition *partition_ptr;
+ 
+ DEF_VEC_P(partition_ptr);
+ DEF_VEC_ALLOC_P(partition_ptr,heap);
+ 
+ static bool
+ cgraph_node_will_be_output_p (struct cgraph_node *node)
+ {
+   return (node->analyzed && !node->global.inlined_to);
+ }
+ 
+ static void
+ account_callees (struct partition *part,
+                VEC (partition_ptr, heap) *partitions,
+                struct cgraph_node *node)
+ {
+   struct cgraph_edge *edge;
+   struct priority *p;
+   int i;
+   struct ipa_ref *ref;
+ 
+   for (edge = node->callees; edge; edge = edge->next_callee)
+     if (edge->inline_failed)
+       {
+       if (!edge->callee->aux)
+         {
+           struct partition *dest = VEC_index (partition_ptr, partitions,
+                                               edge->callee->uid);
+           if (dest && dest != part)
+             {
+               p = XCNEW (struct priority);
+               edge->callee->aux = p;
+               p->src_partition = part;
+               p->dest_partition = dest;
+               VEC_safe_push (priority_ptr, heap, part->priorities, p);
+               VEC_safe_push (priority_ptr, heap, dest->in_priorities, p);
+             }
+           else
+             continue;
+         }
+       else
+         p = (struct priority *)edge->callee->aux;
+       if (edge->count)
+         p->priority += edge->count + 1;
+       else
+         {
+           int mul = 1;
+           switch (edge->caller->frequency)
+             {
+             case NODE_FREQUENCY_UNLIKELY_EXECUTED: mul = 0; break;
+             case NODE_FREQUENCY_EXECUTED_ONCE: mul = 1; break;
+             case NODE_FREQUENCY_NORMAL: mul = 10; break;
+             case NODE_FREQUENCY_HOT: mul = 1000; break;
+             }
+           p->priority += edge->frequency * mul + 1;
+         }
+       }
+     else
+       account_callees (part, partitions, edge->callee);
+ 
+   /* Account taking address of a function as possible call with a low 
priority.  */
+   for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
+     if (ref->refered_type == IPA_REF_CGRAPH)
+       {
+       struct cgraph_node *node2 = ipa_ref_node (ref);
+       if (!node2->aux)
+         {
+           struct partition *dest = VEC_index (partition_ptr, partitions,
+                                               node2->uid);
+           if (dest && dest != part)
+             {
+               p = XCNEW (struct priority);
+               node2->aux = p;
+               p->src_partition = part;
+               p->dest_partition = dest;
+               VEC_safe_push (priority_ptr, heap, part->priorities, p);
+               VEC_safe_push (priority_ptr, heap, dest->in_priorities, p);
+             }
+           else
+             continue;
+         }
+       else
+         p = (struct priority *)node2->aux;
+       p->priority++;
+       }
+ }
+ 
+ static void
+ clear_callees (struct cgraph_node *node)
+ {
+   struct cgraph_edge *edge;
+   int i;
+   struct ipa_ref *ref;
+ 
+   for (edge = node->callees; edge; edge = edge->next_callee)
+     if (edge->inline_failed)
+       edge->callee->aux = NULL;
+     else
+       clear_callees (edge->callee);
+   for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
+     if (ref->refered_type == IPA_REF_CGRAPH)
+       ipa_ref_node (ref)->aux = NULL;
+ }
+ 
+ /* Compare two priority vector entries via indexes of destination
+    partitions.  */
+ 
+ static int
+ priority_cmp (const void *pa, const void *pb)
+ {
+   const struct priority *a = *(const struct priority * const *) pa;
+   const struct priority *b = *(const struct priority * const *) pb;
+   /* Priorities pointing to dead partitions are removed lazily,
+      so just avoid referencing dead memory.  */
+   if (!a->priority)
+     return ! b->priority ? 0 : -1;
+   if (!b->priority)
+     return 1;
+   return a->dest_partition->index - b->dest_partition->index;
+ }
+ 
+ /* Compare two priority vector entries via indexes of destination
+    partitions.  */
+ 
+ static int
+ in_priority_cmp (const void *pa, const void *pb)
+ {
+   const struct priority *a = *(const struct priority * const *) pa;
+   const struct priority *b = *(const struct priority * const *) pb;
+   /* Priorities pointing to dead partitions are removed lazily,
+      so just avoid referencing dead memory.  */
+   if (!a->priority)
+     return ! b->priority ? 0 : -1;
+   if (!b->priority)
+     return 1;
+   return a->src_partition->index - b->src_partition->index;
+ }
+ 
+ static void
+ delete_priority (fibheap_t heap, struct priority *p)
+ {
+   p->priority = 0;
+   p->src_partition = p->dest_partition = NULL;
+   if (p->node)
+     fibheap_delete_node (heap, p->node);
+   p->node = NULL;
+ }
+ 
+ static void
+ merge_partitions (VEC (partition_ptr, heap) *partitions,
+                 fibheap_t heap,
+                 struct partition *part1,
+                 struct partition *part2)
+ {
+   VEC (priority_ptr, heap) *priorities = NULL;
+   VEC (priority_ptr, heap) *in_priorities = NULL;
+   unsigned int i1 = 0, i2 = 0;
+   struct cgraph_node *node;
+   int i;
+ 
+   /* We keep priority lists ordered by indexes of destination partitions
+      to allow effective merging.  */
+   qsort (VEC_address (priority_ptr, part1->priorities),
+        VEC_length (priority_ptr, part1->priorities),
+        sizeof (priority_ptr),
+        priority_cmp);
+   qsort (VEC_address (priority_ptr, part2->priorities),
+        VEC_length (priority_ptr, part2->priorities),
+        sizeof (priority_ptr),
+        priority_cmp);
+   
+   /* Merge priority lists, prune out references of partition to itself.
+      Assume priority lists are ordered by indexes of destination partitions
+      and keep them so.  */
+   while (1)
+     {
+       struct priority *p1, *p2;
+ 
+       while (i1 < VEC_length (priority_ptr, part1->priorities)
+            && !VEC_index (priority_ptr, part1->priorities, i1)->priority)
+       i1++;
+       while (i2 < VEC_length (priority_ptr, part2->priorities)
+            && !VEC_index (priority_ptr, part2->priorities, i2)->priority)
+       i2++;
+ 
+       if (i1 == VEC_length (priority_ptr, part1->priorities)
+         && i2 == VEC_length (priority_ptr, part2->priorities))
+       break;
+ 
+       if (i1 < VEC_length (priority_ptr, part1->priorities)
+         && (i2 == VEC_length (priority_ptr, part2->priorities)
+             || (VEC_index (priority_ptr, part1->priorities,
+                            i1)->dest_partition->index
+                 < VEC_index (priority_ptr, part2->priorities,
+                              i2)->dest_partition->index)))
+       {
+         p1 = VEC_index (priority_ptr, part1->priorities, i1);
+         if (p1->dest_partition != part2)
+           VEC_safe_push (priority_ptr, heap, priorities, p1);
+         else
+           delete_priority (heap, p1);
+         i1++;
+       }
+       else if (i2 < VEC_length (priority_ptr, part2->priorities)
+              && (i1 == VEC_length (priority_ptr, part1->priorities)
+                  || (VEC_index (priority_ptr, part1->priorities,
+                                 i1)->dest_partition->index
+                      > VEC_index (priority_ptr, part2->priorities,
+                                   i2)->dest_partition->index)))
+       {
+         p2 = VEC_index (priority_ptr, part2->priorities, i2);
+         if (p2->dest_partition != part1)
+           {
+             VEC_safe_push (priority_ptr, heap, priorities, p2);
+             p2->src_partition = part1;
+           }
+         else
+           delete_priority (heap, p2);
+         i2++;
+       }
+       else
+       {
+         p1 = VEC_index (priority_ptr, part1->priorities, i1);
+         p2 = VEC_index (priority_ptr, part2->priorities, i2);
+         p1->priority += p2->priority;
+         fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority);
+         delete_priority (heap, p2);
+         VEC_safe_push (priority_ptr, heap, priorities, p1);
+         i1++;
+         i2++;
+       }
+     }
+   VEC_free (priority_ptr, heap, part1->priorities);
+   part1->priorities = priorities;
+ 
+   qsort (VEC_address (priority_ptr, part1->in_priorities),
+        VEC_length (priority_ptr, part1->in_priorities),
+        sizeof (priority_ptr),
+        in_priority_cmp);
+   qsort (VEC_address (priority_ptr, part2->in_priorities),
+        VEC_length (priority_ptr, part2->in_priorities),
+        sizeof (priority_ptr),
+        in_priority_cmp);
+   i1 = 0;
+   i2 = 0;
+   while (1)
+     {
+       struct priority *p1, *p2;
+       while (i1 < VEC_length (priority_ptr, part1->in_priorities)
+            && !VEC_index (priority_ptr, part1->in_priorities, i1)->priority)
+       i1++;
+       while (i2 < VEC_length (priority_ptr, part2->in_priorities)
+            && !VEC_index (priority_ptr, part2->in_priorities, i2)->priority)
+       i2++;
+ 
+       if (i1 == VEC_length (priority_ptr, part1->in_priorities)
+         && i2 == VEC_length (priority_ptr, part2->in_priorities))
+       break;
+ 
+       if (i1 < VEC_length (priority_ptr, part1->in_priorities)
+         && (i2 == VEC_length (priority_ptr, part2->in_priorities)
+             || (VEC_index (priority_ptr, part1->in_priorities,
+                            i1)->src_partition->index
+                 < VEC_index (priority_ptr, part2->in_priorities,
+                              i2)->src_partition->index)))
+       {
+         p1 = VEC_index (priority_ptr, part1->in_priorities, i1);
+         if (p1->src_partition != part2)
+           VEC_safe_push (priority_ptr, heap, in_priorities, p1);
+         else
+           delete_priority (heap, p1);
+         i1++;
+       }
+       else if (i2 < VEC_length (priority_ptr, part2->in_priorities)
+              && (i1 == VEC_length (priority_ptr, part1->in_priorities)
+                  || (VEC_index (priority_ptr, part1->in_priorities,
+                                 i1)->src_partition->index
+                      > VEC_index (priority_ptr, part2->in_priorities,
+                                   i2)->src_partition->index)))
+       {
+         p2 = VEC_index (priority_ptr, part2->in_priorities, i2);
+         if (p2->src_partition != part1)
+           {
+             VEC_safe_push (priority_ptr, heap, in_priorities, p2);
+             p2->dest_partition = part1;
+           }
+         else
+           delete_priority (heap, p2);
+         i2++;
+       }
+       else
+       {
+         p1 = VEC_index (priority_ptr, part1->in_priorities, i1);
+         p2 = VEC_index (priority_ptr, part2->in_priorities, i2);
+         p1->priority += p2->priority;
+         fibheap_replace_key (heap, p1->node, INT_MAX - p1->priority);
+         delete_priority (heap, p2);
+         VEC_safe_push (priority_ptr, heap, in_priorities, p1);
+         i1++;
+         i2++;
+       }
+     }
+   VEC_free (priority_ptr, heap, part1->in_priorities);
+   part1->in_priorities = in_priorities;
+ 
+   for (i = 0; VEC_iterate (cgraph_node_ptr, part2->nodes, i, node); i++)
+     VEC_safe_push (cgraph_node_ptr, heap, part1->nodes, node);
+ 
+   VEC_replace (partition_ptr, partitions, part2->index, NULL);
+   VEC_free (priority_ptr, heap, part2->priorities);
+   VEC_free (priority_ptr, heap, part2->in_priorities);
+   VEC_free (cgraph_node_ptr, heap, part2->nodes);
+   free (part2);
+ }
+ 
+ static void
+ dump_partition (struct partition *part)
+ {
+   int i;
+   struct cgraph_node *node;
+   struct priority *p;
+   fprintf (dump_file, "  Partition %i:", part->index);
+   for (i = 0; VEC_iterate (cgraph_node_ptr, part->nodes, i, node); i++)
+     fprintf (dump_file, "  %s/%i", cgraph_node_name (node), node->uid);
+   fprintf (dump_file, "\n    priorities:");
+   for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++)
+     if (p->priority)
+       {
+       gcc_assert (p->src_partition == part);
+       gcc_assert (p->dest_partition != part);
+       fprintf (dump_file, "  %i:%i", p->dest_partition->index,
+                (int)p->priority);
+       }
+   fprintf (dump_file, "\n");
+ }
+ 
+ static unsigned int
+ ipa_func_reorder (void)
+ {
+   struct cgraph_node *node;
+   VEC (partition_ptr, heap) *partitions = VEC_alloc (partition_ptr, heap, 
cgraph_max_uid);
+   int i;
+   struct partition *part, *first_part = NULL;
+   int freq;
+   fibheap_t heap;
+ 
+   if (!cgraph_max_uid)
+     return 0;
+ 
+   heap = fibheap_new ();
+   VEC_safe_grow_cleared (partition_ptr, heap, partitions, cgraph_max_uid);
+ 
+   for (node = cgraph_nodes; node; node = node->next)
+     if (cgraph_node_will_be_output_p (node))
+       {
+       part = XCNEW (struct partition);
+       part->index = node->uid;
+       VEC_safe_push (cgraph_node_ptr, heap, part->nodes, node);
+       VEC_replace (partition_ptr, partitions, node->uid, part);
+       }
+ 
+   if (dump_file)
+     fprintf (dump_file, "\n\nCreating partitions\n");
+   for (node = cgraph_nodes; node; node = node->next)
+     if (cgraph_node_will_be_output_p (node))
+       {
+       part = VEC_index (partition_ptr, partitions, node->uid);
+       account_callees (part, partitions, node);
+       clear_callees (node);
+       if (dump_file)
+         dump_partition (part);
+       }
+   for (node = cgraph_nodes; node; node = node->next)
+     if (cgraph_node_will_be_output_p (node))
+       {
+       struct priority *p;
+       part = VEC_index (partition_ptr, partitions, node->uid);
+ 
+       for (i = 0; VEC_iterate (priority_ptr, part->priorities, i, p); i++)
+         p->node = fibheap_insert (heap, INT_MAX - p->priority, p);
+       if (dump_file)
+         dump_partition (part);
+       }
+ 
+   if (dump_file)
+     fprintf (dump_file, "\n\nMerging by priorities\n");
+   while (!fibheap_empty (heap))
+     {
+       struct priority *p = (struct priority *)fibheap_extract_min (heap);
+       struct partition *part = p->src_partition;
+       p->node = NULL;
+       if (dump_file)
+       {
+         fprintf (dump_file,
+                  "Concatenating partitions %i and %i, priority %i\n",
+                  p->src_partition->index,
+                  p->dest_partition->index,
+                  (int)p->priority);
+         if (dump_file)
+           dump_partition (p->src_partition);
+         if (dump_file)
+           dump_partition (p->dest_partition);
+       }
+       merge_partitions (partitions, heap, p->src_partition, 
p->dest_partition);
+       if (dump_file)
+       dump_partition (part);
+     }
+ 
+   /* We ran out of calls to merge.  Try to arrange remaining partitions
+      approximately in execution order: static constructors first, followed
+      by partition containing function main ()
+      followed by hot sections of the program.  */
+   if (dump_file)
+     fprintf (dump_file, "\n\nLooking for static constructors\n");
+   for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
+     if (part && part != first_part
+       && DECL_STATIC_CONSTRUCTOR (VEC_index (cgraph_node_ptr,
+                                              part->nodes, 0)->decl))
+       {
+       if (dump_file)
+         dump_partition (part);
+       if (!first_part)
+        first_part = part;
+       else
+        merge_partitions (partitions, heap, first_part, part);
+       }
+   if (dump_file)
+     fprintf (dump_file, "\n\nLooking for main\n");
+   for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
+     if (part && part != first_part
+       && MAIN_NAME_P (DECL_NAME (VEC_index (cgraph_node_ptr,
+                                             part->nodes, 0)->decl)))
+       {
+       if (dump_file)
+         dump_partition (part);
+       if (!first_part)
+        first_part = part;
+       else
+        merge_partitions (partitions, heap, first_part, part);
+       }
+   if (dump_file)
+     fprintf (dump_file, "\n\nMerging by frequency\n");
+   for (freq = NODE_FREQUENCY_HOT; freq >= NODE_FREQUENCY_UNLIKELY_EXECUTED;
+        freq--)
+     {
+       for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
+       if (part && part != first_part
+           && VEC_index (cgraph_node_ptr, part->nodes, 0)->frequency == freq)
+         {
+           if (dump_file)
+             dump_partition (part);
+           if (!first_part)
+            first_part = part;
+           else
+            merge_partitions (partitions, heap, first_part, part);
+         }
+     }
+   for (i = 0; VEC_iterate (partition_ptr, partitions, i, part); i++)
+     gcc_assert (!part || part == first_part);
+ 
+   fibheap_delete (heap);
+   if (!first_part)
+     return 0;
+   if (dump_file)
+     {
+       fprintf (dump_file, "\n\nFinal order\n");
+       dump_partition (first_part);
+     }
+ 
+   /* Store the resulting order and kill the single remaining partition.  */
+   for (i = 0; VEC_iterate (cgraph_node_ptr, first_part->nodes, i, node); i++)
+     node->order = i;
+   VEC_free (priority_ptr, heap, first_part->priorities);
+   VEC_free (cgraph_node_ptr, heap, first_part->nodes);
+   free (first_part);
+   return 0;
+ }
+ 
+ static bool
+ gate_ipa_func_reorder (void)
+ {
+   return optimize && flag_ipa_reorder && flag_toplevel_reorder;
+ }
+ 
+ struct ipa_opt_pass_d pass_ipa_func_reorder =
+ {
+  {
+   IPA_PASS,
+   "reorder",                          /* name */
+   gate_ipa_func_reorder,              /* gate */
+   ipa_func_reorder,                   /* execute */
+   NULL,                                       /* sub */
+   NULL,                                       /* next */
+   0,                                  /* static_pass_number */
+   TV_CGRAPHOPT,                               /* tv_id */
+   0,                                  /* properties_required */
+   0,                                  /* properties_provided */
+   0,                                  /* properties_destroyed */
+   0,                                  /* todo_flags_start */
+   0                                     /* todo_flags_finish */
+  },
+  NULL,                                        /* generate_summary */
+  NULL,                                        /* write_summary */
+  NULL,                                        /* read_summary */
+  NULL,                                        /* write_optimization_summary */
+  NULL,                                        /* read_optimization_summary */
+  NULL,                                        /* stmt_fixup */
+  0,                                   /* TODOs */
+  NULL,                                        /* function_transform */
+  NULL                                 /* variable_transform */
+ };
Index: Makefile.in
===================================================================
*** Makefile.in (revision 164689)
--- Makefile.in (working copy)
*************** OBJS-archive = \
*** 1461,1466 ****
--- 1461,1467 ----
        ipa-type-escape.o \
        ipa-utils.o \
        ipa.o \
+       ipa-reorder.o \
        matrix-reorg.o \
        prefix.o \
        tree-inline.o \
*************** varpool.o : varpool.c $(CONFIG_H) $(SYST
*** 3012,3017 ****
--- 3013,3020 ----
     $(TREE_FLOW_H) gt-varpool.h
  ipa.o : ipa.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(CGRAPH_H) \
     $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) pointer-set.h
+ ipa-reorder.o : ipa-reorder.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+    $(CGRAPH_H) $(TREE_PASS_H) $(TIMEVAR_H) $(GIMPLE_H) $(GGC_H) pointer-set.h
  ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
     langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) $(DIAGNOSTIC_H) 
\
     $(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
Index: passes.c
===================================================================
*** passes.c    (revision 164689)
--- passes.c    (working copy)
*************** init_optimization_passes (void)
*** 818,823 ****
--- 818,824 ----
    NEXT_PASS (pass_ipa_type_escape);
    NEXT_PASS (pass_ipa_pta);
    NEXT_PASS (pass_ipa_struct_reorg);
+   NEXT_PASS (pass_ipa_func_reorder);
    *p = NULL;
  
    p = &all_lto_gen_passes;

Reply via email to