> > > > Interesting. My plan for profiling with LTO is to ultimately make it > > linktime > > transform. This will be more difficult with WHOPR (i.e. instrumenting need > > function bodies that are not available at WPA time), but I believe it is > > solvable: just assign uids to the edges and do instrumentation at ltrans. > > Then > > we will save cgraph profile in some easier way so WHOPR can read it in and > > read > > rest of stuff in ltrans. This would invovlve shipping the correct profiles > > for > > given function etc so it will be a bit of implementation challenge. > > This can be tricky -- to maximize FDO benefit, the > profile-use/annotation needs to happen early which means > instrumentation also needs to happen early (to avoid cfg mismatches).
I don't see much problem in this particular area. GCC optimization queue is organized in a way that we first do early optimizatoins that all are intended to be simple cleanups without size/speed tradeoffs. Then we do IPA and late optimizations that are both driven by profile (estimated or read). Profile reading happens early because we use same infrastructure for gcov and profile feedback. This is not giving profile feedback better benefit, quite a converse since early passes may not be able to update profile precisely and we also get higher profile overhead. So I think decoupling gcov and profile feedback and pushing profile feedback back in queue is going to be win. Yes, optimization must match, but with LTO this is not problem and in general the early optimization should be stable wrt memory layout (nothing else changes). This used to be excercised before profiling was updated to tree level in 4.x. I would be very interested in the low overhead support - there is a lot to gain especially because the profiling resuls are less dependent on setup and can be better reused. I know part of code was contributed (the support for reading not 100% valid profiles). Is there any extra info available on this? Main problem IMO is how to get profile into WHOPR without having function bodies. I guess we will end up with summarizing the info in WHOR firendly way and letting it to stream the other counters to LTRANS that will annotate the function body once read in from the file. > > > > > >> 2) comdat function resolution -- since LIPO uses aux module functions > >> for inlining purpose only, it has the freedom to choose which copy to > >> use. The current scheme chooses copy in current module with priority > >> for better profile data context sensitivity (see below) > > > > This is interesting. How do you solve the problem when given comdat > > function > > "loose"? I.e. it is replaced at linktime by other function that may or may > > not be profiled from other unit? > > Whatever function that is selected will have profile data (assuming it > called at runtime) -- but the profile data are merged from different > contexts including from calls in different modules. For instance, > both a.C and b.C define foo. and b.C:foo is selected at runtime, and > a.C:foo is not inlined (after instrumentation) anywhere in a.C, then > a.C:foo won't have any profile data, and b.C:foo has merged profile > data resulting from calls in both a.C and b.C. Yes, but this is what I am concerned about. Without LTO at least when compiling a.C with profile feedback we will have foo with 0 counts. We might however work out that calls of foo are frequent and decide to inline foo. We will take the counts and rescale resulting in inlining foo optimized for size. When comdats are resolved within LTO, this will not be deal, but LTO still produce comdats that are later resolved with library etc., so we don't solve the problem this way. At very least we should be able to figure out that we are having function that has no profile and do something more sane. Do you have any idea how common these scenarios are? Honza