> What I've observed on power is that LTO alone reduces performance and > LTO+FDO is not significantly different than FDO alone. > > I agree that an exact estimate of the register pressure would be a > difficult problem. I'm hoping that something that approximates potential > register pressure downstream will be sufficient to help inlining > decisions.
One (ortoghonal) way to deal with this problem would be also to disable inlining of functions called once when the edge frequency is low. I.e. adding to check_callers something like edge->frequency > CGRAPH_FREQ_BASE / 2 if you want to disqualify all calls that have only 50% chance that they will be called during function invocation. Does something like that help in your cases? It would help in the case Linus complained about http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194 The difficulty here is that disabling inlies on not so important paths may prevent SRA and other optimizations so it may in turn also penalize the hot path. I saw this in some cases where EH cleanup code was optimized for size. Perhaps SRA canalso be extended to handle cases where non-SRAable code is on a cold path? Honza
