On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li <davi...@google.com> wrote: > ok for google/main.
Thanks, the patch is now committed. > > David > > On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsri...@google.com> wrote: >> +davidxl >> >> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsri...@google.com> wrote: >>> Patch Description: >>> ================= >>> >>> I am working on a project to do global function layout in the linker where >>> the linker reads the callgraph edge profile information, generated by FDO, >>> and uses that to find a ordering of functions that will place functions >>> calling each other frequently closer, like the Pettis-Hansen code ordering >>> algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI >>> 1990. >>> >>> This patch adds a flag that allows the callgraph edge profile information >>> to be stored .note sections called ".note.callgraph.text". The new compiler >>> flag -fcallgraph-profiles-sections generates these sections and must be >>> used along with -fprofile-use. I have added a PARAM to only output >>> callgraph edges greater than a specified threshold. Once this is available, >>> the linker can read these sections and generate a global callgraph which >>> can be used to determine a global function ordering. >>> >>> I am adding plugin support in the gold linker to allow linker plugins to be >>> able to read the contents of sections and also adding plugin hooks to >>> specify a desired ordering of functions to the linker. The linker patch is >>> available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. >>> Once this is available, linker plugins can be used to determine the >>> function layout, like the Pettis-Hansen algorithm, of the final binary. >>> >>> Example: The new .note.callgraph.text sections looks like this for a >>> function foo that calls bar 100 times and zap 50 times: >>> **************************** >>> .section .note.callgraph.text._Z3foov,"",@progbits >>> .string "Function _Z3foov" >>> .string "_Z3barv" >>> .string "100" >>> .string "_Z3zapv" >>> .string "50" >>> *************************** >>> >>> For now, this is for google/main. I will re-submit for review to trunk >>> along with data layout. >>> >>> Google ref 41940 >>> >>> 2011-06-07 Sriraman Tallam <tmsri...@google.com> >>> >>> * doc/invoke.texi: document option -fcallgraph-profiles-sections. >>> * final.c (dump_cgraph_profiles): New function. >>> (rest_of_handle_final): Create new section '.note.callgraph.text' >>> with compiler flag -fcallgraph-profiles-sections >>> * common.opt: New option -fcallgraph-profiles-sections. >>> * params.def (DEFPARAM): New param >>> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. >>> >>> Index: doc/invoke.texi >>> =================================================================== >>> --- doc/invoke.texi (revision 174789) >>> +++ doc/invoke.texi (working copy) >>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. >>> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol >>> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol >>> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol >>> --fcheck-data-deps -fclone-hot-version-paths @gol >>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths >>> @gol >>> -fcombine-stack-adjustments -fconserve-stack @gol >>> -fcompare-elim -fcprop-registers -fcrossjumping @gol >>> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol >>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline >>> @opindex fripa-verbose >>> Enable printing of verbose information about dynamic inter-procedural >>> optimizations. >>> This is used in conjunction with the @option{-fripa}. >>> + >>> +@item -fcallgraph-profiles-sections >>> +@opindex fcallgraph-profiles-sections >>> +Emit call graph edge profile counts in .note.callgraph.text sections. This >>> is >>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text >>> +section is created for each function. This section lists every callee and >>> the >>> +number of times it is called. The params variable >>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a >>> +certain threshold. >>> @end table >>> >>> The following options control compiler behavior regarding floating >>> Index: final.c >>> =================================================================== >>> --- final.c (revision 174789) >>> +++ final.c (working copy) >>> @@ -4321,13 +4321,37 @@ debug_free_queue (void) >>> symbol_queue_size = 0; >>> } >>> } >>> - >>> + >>> +/* List the call graph profiled edges whise value is greater than >>> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the >>> + ".note.callgraph.text" section. */ >>> +static void >>> +dump_cgraph_profiles (void) >>> +{ >>> + struct cgraph_node *node = cgraph_node (current_function_decl); >>> + struct cgraph_edge *e; >>> + struct cgraph_node *callee; >>> + >>> + for (e = node->callees; e != NULL; e = e->next_callee) >>> + { >>> + if (e->count <= PARAM_VALUE >>> (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) >>> + continue; >>> + callee = e->callee; >>> + fprintf (asm_out_file, "\t.string \"%s\"\n", >>> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); >>> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC >>> "\"\n", >>> + e->count); >>> + } >>> +} >>> + >>> /* Turn the RTL into assembly. */ >>> static unsigned int >>> rest_of_handle_final (void) >>> { >>> rtx x; >>> const char *fnname; >>> + char *profile_fnname; >>> + unsigned int flags; >>> >>> /* Get the function's name, as described by its RTL. This may be >>> different from the DECL_NAME name used in the source file. */ >>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) >>> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), >>> decl_fini_priority_lookup >>> (current_function_decl)); >>> + >>> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing >>> + profiling information. */ >>> + if (flag_callgraph_profiles_sections >>> + && flag_profile_use >>> + && cgraph_node (current_function_decl) != NULL) >>> + { >>> + flags = SECTION_DEBUG; >>> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); >>> + switch_to_section (get_section (profile_fnname, flags, NULL)); >>> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); >>> + dump_cgraph_profiles (); >>> + free (profile_fnname); >>> + } >>> + >>> return 0; >>> } >>> >>> Index: common.opt >>> =================================================================== >>> --- common.opt (revision 174789) >>> +++ common.opt (working copy) >>> @@ -907,6 +907,10 @@ fcaller-saves >>> Common Report Var(flag_caller_saves) Optimization >>> Save registers around function calls >>> >>> +fcallgraph-profiles-sections >>> +Common Report Var(flag_callgraph_profiles_sections) Init(0) >>> +Generate .note.callgraph.text sections listing callees and edge counts. >>> + >>> fcheck-data-deps >>> Common Report Var(flag_check_data_deps) >>> Compare the results of several data dependence analyzers. >>> Index: params.def >>> =================================================================== >>> --- params.def (revision 174789) >>> +++ params.def (working copy) >>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, >>> "maximum length of the call graph path to be cloned " >>> "while doing multiversioning", >>> 2, 0, 5) >>> + >>> +/* Only output those call graph edges in .note.callgraph.text sections >>> + whose count is greater than this value. */ >>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, >>> + "note-cgraph-section-edge-threshold", >>> + "minimum call graph edge count for inclusion in " >>> + ".note.callgraph.text section", >>> + 0, 0, 0) >>> + >>> /* >>> Local variables: >>> mode:c >>> >>> -- >>> This patch is available for review at http://codereview.appspot.com/4591045 >>> >> >