Finally had some time to pick this up again. See responses and questions below. But it looks like you are commenting on a stale version of the patch. See the update I sent in March:
https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01176.html On Sun, May 25, 2014 at 10:43 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >> This patch attempts to address the lost profile issue for COMDATs in >> more circumstances, exposed by function splitting. >> >> My earlier patch handled the case where the comdat had 0 counts since >> the linker kept the copy in a different module. In that case we >> prevent the guessed frequencies on 0-count functions from being >> dropped by counts_to_freq, and then later mark any reached via >> non-zero callgraph edges as guessed. Finally, when one such 0-count >> comdat is inlined the call count is propagated to the callee blocks >> using the guessed probabilities. >> >> However, in this case, there was a comdat that had a very small >> non-zero count, that was being inlined to a much hotter callsite. This >> could happen when there was a copy that was ipa-inlined >> in the profile gen compile, so the copy in that module gets some >> non-zero counts from the ipa inlined instance, but when the out of >> line copy was eliminated by the linker (selected from a different >> module). In this case the inliner was scaling the bb counts up quite a >> lot when inlining. The problem is that you most likely can't trust >> that the 0 count bbs in such a case are really not executed by the >> callsite it is being into, since the counts are very small and >> correspond to a different callsite. In some internal C++ applications >> I am seeing that we execute out of the split cold portion of COMDATs >> for this reason. > > With all the problems we have with COMDATs, it still seems to me that perhaps > best approach would be to merge them in runtime, as we discussed some time ago > and incrementally handle the problem how to get callstack sensitive profiles. I recently committed some functionality to a google branch to do some COMDAT profile merging at runtime, but that was to handle all-zero count copies and operates on top of the LIPO runtime infrastructure. We don't do any fixup in non-zero count COMDATs to avoid losing context-sensitive profiles. >> >> This problem is more complicated to address than the 0-count instance, >> because we need the cgraph to determine which functions to drop the >> profile on, and at that point the estimated frequencies have already >> been overwritten by counts_to_freqs. To handle this broader case, I >> have changed the drop_profile routine to simply re-estimate the >> probabilities/frequencies (and translate these into counts scaled from >> the incoming call edge counts). This unfortunately necessitates >> rebuilding the cgraph, to propagate the new synthesized counts and >> avoid checking failures downstream. But it will only be rebuilt if we >> dropped any profiles. With this solution, some of the older approach >> can be removed (e.g. propagating counts using the guessed >> probabilities during inlining). >> >> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu. >> Also tested on >> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens? >> >> Thanks, >> Teresa > > 2014-02-12 Teresa Johnson <tejohn...@google.com> > > * graphite.c (graphite_finalize): Pass new parameter. > * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. > * predict.c (tree_estimate_probability): New parameter. > (tree_estimate_probability_worker): Renamed from > tree_estimate_probability_driver. > (tree_reestimate_probability): New function. > (tree_estimate_probability_driver): Invoke > tree_estimate_probability_worker. > (freqs_to_counts): Move here from tree-inline.c. > (drop_profile): Re-estimated profiles when dropping counts. > (handle_missing_profiles): Drop for some non-zero functions as well, > get counts from bbs to support invocation before cgraph rebuild. > (counts_to_freqs): Remove code obviated by reestimation. > * predict.h (tree_estimate_probability): Update declaration. > * tree-inline.c (freqs_to_counts): Move to predict.c. > (copy_cfg_body): Remove code obviated by reestimation. > * tree-profile.c (tree_profiling): Invoke handle_missing_profiles > before cgraph rebuild. > > Index: params.def > =================================================================== > --- params.def (revision 207436) > +++ params.def (working copy) > @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, > "Maximal estimated outcome of branch considered predictable", > 2, 0, 50) > > +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, > + "min-caller-reestimate-ratio", > + "Minimum caller-to-callee node count ratio to force reestimated > branch " > + "probabilities in callee (where 0 means only when callee count is > 0)", > + 10, 0, 0) > > OK, so if callee count is 10% of call site count, we do the recomputation? Right. > > @@ -2390,7 +2392,8 @@ void > connect_infinite_loops_to_exit (); > /* We use loop_niter_by_eval, which requires that the loops have > preheaders. */ > - create_preheaders (CP_SIMPLE_PREHEADERS); > + if (!redo) > + create_preheaders (CP_SIMPLE_PREHEADERS); > > So you disable preheaders construction because this all happens at inlining > time when we do not want to change BBs because callgraph is built, right? > Are you sure we do have preheaders built at IPA time for all functions > reliably? > (It would make sense to do so, but I think loop analysis is done rather > randomly > by IPA analysis passes, such as inliner). > Otherwise the SCEV infrastructure will explode, if I remember correctly. I can't remember why I guarded this, but it appears it was just precautionary. I just changed the implementation to assert before invoking create_preheaders if redo is true and the loops don't already have preheaders, made the call to create_preheaders unconditional, and added an assert afterwards if there was any change when redo is true: gcc_assert (!redo || loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)); bool changed = create_preheaders (CP_SIMPLE_PREHEADERS); gcc_assert (!(redo && changed)); (changed create_preheaders to return a bool indicating whether any preheaders were created). Neither assert fired for either gcc regression tests, or more extensive tests I ran on internal C++ benchmarks. Should I change the patch to make the call unconditional and add these asserts? > > +/* Force re-estimation of the probabilities, because the profile was > + deemed insufficient. */ > + > +static void > +tree_reestimate_probability (void) > +{ > + basic_block bb; > + edge e; > + edge_iterator ei; > + > + /* Need to reset the counts to ensure probabilities are recomputed. */ > + FOR_EACH_BB_FN (bb, cfun) > + { > + bb->count = 0; > + FOR_EACH_EDGE (e, ei, bb->succs) > + e->count = 0; > + } > + tree_estimate_probability_worker (true); > > You also want to remove recorded loop count estimates, since they are also > driven by profile we chose to not trust. Can you point me to the recorded loop count estimates you are referring to? I poked around and didn't find them. The places I looked used the edge counts to estimate, and seemed downstream of this. > > @@ -2842,48 +2927,77 @@ handle_missing_profiles (void) > gcov_type max_tp_first_run = 0; > struct function *fn = DECL_STRUCT_FUNCTION (node->decl); > > - if (node->count) > - continue; > for (e = node->callers; e; e = e->next_caller) > { > - call_count += e->count; > + call_count += cgraph_function_with_gimple_body_p (e->caller) > + ? gimple_bb (e->call_stmt)->count : 0; > > I am quite confused by this change - we you don't want to trust edge counts? > Other parts of the patch seems quite resonable, but I don't really follow > here. It is because we are invoking this before rebuilding the cgraph. Here is the comment to this effect that I added just above handle_missing_profiles: + This is now invoked before rebuilding the cgraph after reading profile + counts, so the cgraph edge and node counts are still 0. Therefore we + need to look at the counts on the entry bbs and the call stmt bbs. + That way we can avoid needing to rebuild the cgraph again to reflect + the nodes with dropped profiles. */ Thanks, Teresa > > Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413