On 10/11/2012 1:34 PM, Mike Hommey wrote:
On Thu, Oct 11, 2012 at 02:26:33PM -0400, Rafael Ávila de Espíndola wrote:
On 10/11/2012 02:33 AM, Mike Hommey wrote:
On Wed, Oct 10, 2012 at 05:57:53PM -0400, Justin Lebar wrote:
By "turning off Linux PGO testing", you really mean "stop making and
distributing Linux PGO builds," right?

The main reason I'd want Linux PGO is for mobile.  On desktop Linux,
most users (I expect) don't run our builds, so it's not a big deal if
they're some percent slower.

Many people have made claims about that at several different occasions.
Can we once and for all come up with actual data on that?

That being said, PGO on Linux is between 5 and 20% improvement on our
various talos tests. That's with the version of gcc we currently use,
which is 4.5. I'd expect 4.7 to do a better job even, especially if we
added lto to the equation (and since we are now building on x86-64
machines, we wouldn't have to worry about memory usage ; link time could
be a problem, though).

Also note that disabling PGO currently also means disabling the
optimizations we do on omni.ja (central directory optimizations and
reordering). This is somehow covered by bug 773171.

I wouldn't be surprised if most of the pgo benefit is because of bad
inline decisions by gcc. If we can narrow the gap by adding
MOZ_ALWAYS_INLINE, then maybe we can drop pgo.

A non-unsignificant part of the performance improvements PGO gives come
from code reordering to improve branch prediction. Presumably, we can
use NS_LIKELY/NS_UNLIKELY to improve some branches manually.

Theory:
PGO seems like the only way to balance speed vs size. GCC has the effect of compiling your hot code with -O3 and cold with -Os. Sure you can gain similar performance gains by aggressively inlining, but you are going to pay for that dearly in code size(and subsequently startup speed and to some degree resident memory usage).

One of the side-effects of being a large program is that code that frequently runs together has a good likelyhood of being spread around the binary. This means large apps need larger caches to stay as fast as smaller apps. PGO offers a way out of this by letting the compiler group warm code in effect letting your large feature-laden app run as well as a nimbler more specialized one. Note I had no data to back this up other than my discussions with GCC devs.

Littering code with manual branchprediction + inlining seems like a really failure-prone unscalable alternative.

Practice:
In practice GCC PGO only has locality benefits at compilation unit level which results in the same suboptimal locality when linked. Only when one throws LTO into the mix is locality handled right. However we have not checked recently if modern GCC is robust enough for our needs yet. So the main benefit of PGO atm is faster startup vs -O3 builds.

Since almost nobody uses Mozilla Firefox builds(and no Firefox disributors do pgo), it may not be that bad to hurt startup for a few precious Linux users.

Stary-eyed-future:
We need PGO + LTO to generate smallest-possible-fast code on mobile. Unfortunately I haven't not heard anything reassuring about ARM PGO/LTO in GCC. It's still likely to be broken as heck.

Taras

PS. Rafael, I'd be very happy to switch to clang if it implemented guided optimizations. I'd be much more tempted to invest resources in fixing clang bugs in this area than fixing gcc ones.


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to