+davidxl
On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsri...@google.com> wrote: > Patch Description: > ================= > > I am working on a project to do global function layout in the linker where > the linker reads the callgraph edge profile information, generated by FDO, > and uses that to find a ordering of functions that will place functions > calling each other frequently closer, like the Pettis-Hansen code ordering > algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI > 1990. > > This patch adds a flag that allows the callgraph edge profile information to > be stored .note sections called ".note.callgraph.text". The new compiler flag > -fcallgraph-profiles-sections generates these sections and must be used along > with -fprofile-use. I have added a PARAM to only output callgraph edges > greater than a specified threshold. Once this is available, the linker can > read these sections and generate a global callgraph which can be used to > determine a global function ordering. > > I am adding plugin support in the gold linker to allow linker plugins to be > able to read the contents of sections and also adding plugin hooks to specify > a desired ordering of functions to the linker. The linker patch is available > here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is > available, linker plugins can be used to determine the function layout, like > the Pettis-Hansen algorithm, of the final binary. > > Example: The new .note.callgraph.text sections looks like this for a function > foo that calls bar 100 times and zap 50 times: > **************************** > .section .note.callgraph.text._Z3foov,"",@progbits > .string "Function _Z3foov" > .string "_Z3barv" > .string "100" > .string "_Z3zapv" > .string "50" > *************************** > > For now, this is for google/main. I will re-submit for review to trunk along > with data layout. > > Google ref 41940 > > 2011-06-07 Sriraman Tallam <tmsri...@google.com> > > * doc/invoke.texi: document option -fcallgraph-profiles-sections. > * final.c (dump_cgraph_profiles): New function. > (rest_of_handle_final): Create new section '.note.callgraph.text' > with compiler flag -fcallgraph-profiles-sections > * common.opt: New option -fcallgraph-profiles-sections. > * params.def (DEFPARAM): New param > PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. > > Index: doc/invoke.texi > =================================================================== > --- doc/invoke.texi (revision 174789) > +++ doc/invoke.texi (working copy) > @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. > -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol > -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol > -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol > --fcheck-data-deps -fclone-hot-version-paths @gol > +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths > @gol > -fcombine-stack-adjustments -fconserve-stack @gol > -fcompare-elim -fcprop-registers -fcrossjumping @gol > -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol > @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline > @opindex fripa-verbose > Enable printing of verbose information about dynamic inter-procedural > optimizations. > This is used in conjunction with the @option{-fripa}. > + > +@item -fcallgraph-profiles-sections > +@opindex fcallgraph-profiles-sections > +Emit call graph edge profile counts in .note.callgraph.text sections. This is > +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text > +section is created for each function. This section lists every callee and the > +number of times it is called. The params variable > +"note-cgraph-section-edge-threshold" can be used to only list edges above a > +certain threshold. > @end table > > The following options control compiler behavior regarding floating > Index: final.c > =================================================================== > --- final.c (revision 174789) > +++ final.c (working copy) > @@ -4321,13 +4321,37 @@ debug_free_queue (void) > symbol_queue_size = 0; > } > } > - > + > +/* List the call graph profiled edges whise value is greater than > + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the > + ".note.callgraph.text" section. */ > +static void > +dump_cgraph_profiles (void) > +{ > + struct cgraph_node *node = cgraph_node (current_function_decl); > + struct cgraph_edge *e; > + struct cgraph_node *callee; > + > + for (e = node->callees; e != NULL; e = e->next_callee) > + { > + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) > + continue; > + callee = e->callee; > + fprintf (asm_out_file, "\t.string \"%s\"\n", > + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); > + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n", > + e->count); > + } > +} > + > /* Turn the RTL into assembly. */ > static unsigned int > rest_of_handle_final (void) > { > rtx x; > const char *fnname; > + char *profile_fnname; > + unsigned int flags; > > /* Get the function's name, as described by its RTL. This may be > different from the DECL_NAME name used in the source file. */ > @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) > targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), > decl_fini_priority_lookup > (current_function_decl)); > + > + /* With -fcgraph-section, add ".note.callgraph.text" section for storing > + profiling information. */ > + if (flag_callgraph_profiles_sections > + && flag_profile_use > + && cgraph_node (current_function_decl) != NULL) > + { > + flags = SECTION_DEBUG; > + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); > + switch_to_section (get_section (profile_fnname, flags, NULL)); > + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); > + dump_cgraph_profiles (); > + free (profile_fnname); > + } > + > return 0; > } > > Index: common.opt > =================================================================== > --- common.opt (revision 174789) > +++ common.opt (working copy) > @@ -907,6 +907,10 @@ fcaller-saves > Common Report Var(flag_caller_saves) Optimization > Save registers around function calls > > +fcallgraph-profiles-sections > +Common Report Var(flag_callgraph_profiles_sections) Init(0) > +Generate .note.callgraph.text sections listing callees and edge counts. > + > fcheck-data-deps > Common Report Var(flag_check_data_deps) > Compare the results of several data dependence analyzers. > Index: params.def > =================================================================== > --- params.def (revision 174789) > +++ params.def (working copy) > @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, > "maximum length of the call graph path to be cloned " > "while doing multiversioning", > 2, 0, 5) > + > +/* Only output those call graph edges in .note.callgraph.text sections > + whose count is greater than this value. */ > +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, > + "note-cgraph-section-edge-threshold", > + "minimum call graph edge count for inclusion in " > + ".note.callgraph.text section", > + 0, 0, 0) > + > /* > Local variables: > mode:c > > -- > This patch is available for review at http://codereview.appspot.com/4591045 >