On 1/28/07, Jan Hubicka <[EMAIL PROTECTED]> wrote:
BTW when inlining seems to make so noticeable difference, did you try to
use profile feedback?
Once a year, i try.
But then it boils down to the fact that as a programmer i have no way
to express how/where i want gcc to put its nose into. And i get back
to fixing branches, inlining and unrolling (wink) by hand.

> I'm aware of that progression and some of my code is already being tested
> http://www.suse.de/~gcctest/c++bench/raytracer/ ;)

I see, we didn't seem to make that much progress on this testcase
performance wise yet ;)
It's a silly 100 LOC raytracer and historically g++ already did the
Right Thing[tm] (inlining everything), there's not much left to be
gained.

> For the other function, which heavily uses SSE vector intrinsics, g++
> is really doing a good job, if only for the, sometimes, duplicated
> structures here & there and the larger frame. But you can rule out
> g++'s inlining heuristic as it has no (or shouldn't have) any freedom.

Hmm, so then it should be esither structure packing or regalloc. I will
be able to take a look only after returning from a course.
Honza
Regalloc is a lost cause on ia32 :)
Note that nowadays g++ is up to the point where despite those wastes,
it's still faster to inline it all in one rendering function than
splitting. And i think you can also put gcse on the culprit list.

Reply via email to