On 1/28/07, Jan Hubicka <[EMAIL PROTECTED]> wrote:
BTW when inlining seems to make so noticeable difference, did you try to use profile feedback?
Once a year, i try. But then it boils down to the fact that as a programmer i have no way to express how/where i want gcc to put its nose into. And i get back to fixing branches, inlining and unrolling (wink) by hand.
> I'm aware of that progression and some of my code is already being tested > http://www.suse.de/~gcctest/c++bench/raytracer/ ;) I see, we didn't seem to make that much progress on this testcase performance wise yet ;)
It's a silly 100 LOC raytracer and historically g++ already did the Right Thing[tm] (inlining everything), there's not much left to be gained.
> For the other function, which heavily uses SSE vector intrinsics, g++ > is really doing a good job, if only for the, sometimes, duplicated > structures here & there and the larger frame. But you can rule out > g++'s inlining heuristic as it has no (or shouldn't have) any freedom. Hmm, so then it should be esither structure packing or regalloc. I will be able to take a look only after returning from a course. Honza
Regalloc is a lost cause on ia32 :) Note that nowadays g++ is up to the point where despite those wastes, it's still faster to inline it all in one rendering function than splitting. And i think you can also put gcse on the culprit list.