Re: FDO and LTO on ARM

Xinliang David Li Fri, 05 Aug 2011 14:34:40 -0700

>
> In a way I like the current scheme since it is simple and extending it
> should IMO have some good reason. We could refine -Os behaviour without
> changing current predicates to optimize for speed in
> a) functions declared as "hot" by user and BBs in them that are not proved
> cold.
> b) based on profile feedback - i.e. we could have two thresholds, BBs with
> very arge counts wil be probably hot, BBs in between will be maybe
> hot/normal and BBs with low counts will be cold.
> This would probably motivate introduction of probably_hot predicate that
> summarize the above.


Introducing a new 'probably_hot' will be very confusing -- unless you
also rename 'maybe_hot', but this leads to finer grained control:
very_hot, hot, normal, cold, unlikely which can be hard to use.  The
three state partition (not counting exec_once) seems ok, but

1) the unlikely state does not have controllable parameter
2) hot_bb_count_fraction parameter which is used to determine
maybe_hotness is shared for all FDO related passes. It is much more
flexible (in terms of tuning) to allow each pass (such as inlining) to
define its  own thresholds.


>
> If we want to refine things, we could also re-consider how we want to behave
> to BBs with 0 coverage. I.e. if we want to
>  a) consider them "normal" and let the presence of -Os/-O123 to decide
> whether they are size/speed optimized,
>  b) consider them "cold" since they are not executed at all,
>  c) consider them "cold" in functions that are otherwise covered by the test
> run and "normal" in case the function is not covered at all (i.e. training X
> server on particular set of hardware may not convince GCC to optimize for
> size all the other drivers not covered by the train run).
>
> We currently implement B and it sort of work well since users usually train
> for what matters for them and are happy to see binaries smaller.

Yes -- we assume user will do his best to find representative training
data to avoid bad optimizations, so b) should be fine.

David


>
> What I don't like about the a&c is bit of inconsistency with small counts.
>  I.e. count 1 will imply optimizing for size, but roundoff error to 0 will
> cause it to be optimized for speed that is weird.
> Of course also flipping the default here would cause significant grown of
> FDO binaries and users are already unhappy that FDO binaries are too large.
>
> Honza
>
>

Re: FDO and LTO on ARM

Reply via email to