On 1/28/07, Jan Hubicka <[EMAIL PROTECTED]> wrote:
I am not quite sure what you mean by direct inlining here. At -O2 G++
Decorating everything in sight with attribute always_inline/noinline (flatten wasn't an option because it used to be troublesome and not as 'portable' across compilers).
I would be interested to know about obvious mistakes GCC do - GCC now has logic to set cost of inlining "wrapper" functions (ie functions doing just one extra call and casts) to at most 0. It might be interesting to know if some common scenarios are missed.
I guess i should remove those attribute and see what it looks like.
Well, we are working on it ;) You can take a look at c++ benchmarks http://www.suse.de/~gcctest the work is ongoing since cgraph was implemented in 2003, another retunning happen at about 4.0 timeframe, 4.3 has the SSA based IPA that should be another improvement.
I'm aware of that progression and some of my code is already being tested http://www.suse.de/~gcctest/c++bench/raytracer/ ;) 4.2 made a substantial difference for me, and it seems 4.3 is well on its way (even if it's a bit chaotic at times); IPA when enabled used to ICE on me and recently started to work, but i've failed to notice a difference (efficiency wise) yet. I guess i should wait a bit more. I very much appreciate the string op stuff, and i'm eagerly waiting for the assume() directive (wink wink).
Thanks, what is definitly most interesting for me is self contained testcase I can easilly compile and run, like we have tramp3d. I will definitly take a lok at your testcases, but perhaps only after returning from trip at next weekend since I am running out of time for all my TODOs today ;)
It's still very much in flux, but once it stabilizes a bit i'll dump everything into a self contained black box of doom.
Concerning the frame sizes, we really need some kind of analysis from where it is comming - ie whether GCC simply inline too much together, or fail to pack well the structures using existing algorithm or it is register pressure problem.
I'm out of my league. I know the frontend_loop function isn't as horrible on x86-64, giving some credit to the register pressure hypothesis, but then that code isn't doing anything fancy. For the other function, which heavily uses SSE vector intrinsics, g++ is really doing a good job, if only for the, sometimes, duplicated structures here & there and the larger frame. But you can rule out g++'s inlining heuristic as it has no (or shouldn't have) any freedom. If there's anything i can do, do not hesitate. And thanks for taking notice.