> > In a way I like the current scheme since it is simple and extending it > should IMO have some good reason. We could refine -Os behaviour without > changing current predicates to optimize for speed in > a) functions declared as "hot" by user and BBs in them that are not proved > cold. > b) based on profile feedback - i.e. we could have two thresholds, BBs with > very arge counts wil be probably hot, BBs in between will be maybe > hot/normal and BBs with low counts will be cold. > This would probably motivate introduction of probably_hot predicate that > summarize the above.
Introducing a new 'probably_hot' will be very confusing -- unless you also rename 'maybe_hot', but this leads to finer grained control: very_hot, hot, normal, cold, unlikely which can be hard to use. The three state partition (not counting exec_once) seems ok, but 1) the unlikely state does not have controllable parameter 2) hot_bb_count_fraction parameter which is used to determine maybe_hotness is shared for all FDO related passes. It is much more flexible (in terms of tuning) to allow each pass (such as inlining) to define its own thresholds. > > If we want to refine things, we could also re-consider how we want to behave > to BBs with 0 coverage. I.e. if we want to > a) consider them "normal" and let the presence of -Os/-O123 to decide > whether they are size/speed optimized, > b) consider them "cold" since they are not executed at all, > c) consider them "cold" in functions that are otherwise covered by the test > run and "normal" in case the function is not covered at all (i.e. training X > server on particular set of hardware may not convince GCC to optimize for > size all the other drivers not covered by the train run). > > We currently implement B and it sort of work well since users usually train > for what matters for them and are happy to see binaries smaller. Yes -- we assume user will do his best to find representative training data to avoid bad optimizations, so b) should be fine. David > > What I don't like about the a&c is bit of inconsistency with small counts. > I.e. count 1 will imply optimizing for size, but roundoff error to 0 will > cause it to be optimized for speed that is weird. > Of course also flipping the default here would cause significant grown of > FDO binaries and users are already unhappy that FDO binaries are too large. > > Honza > >