On Thu, Jul 07, 2011 at 05:53:09PM +0200, Richard Guenther wrote:
> Well, I guess you don't propose to build glibc with -mno-r11?  The compiler
> certainly can't figure out in _all_ cases - but it should be able to handle
> most of the cases (with LTO even more cases) ok, no?

No, we are no proposing to build glibc or any standard library with -mno-r11.

> I also wonder why loading a register is so expensive compared to the
> actual call ...

We are trying to eliminate instructions in the indirect function call pathway,
and this happens to be the first and easiest.  As I said, I see a 5-7% gain in
h264ref, but a 2-3% drop in mcf.  In addition to saving of not loading r11,
perhaps more of the gain comes from not saving the TOC (r2) at the point of the
call, but moving it into the prologue for functions that don't call alloca,
setjmp, or have exceptions.  This is because the instruction sequence before
the change was:

        ld r0,0(<ptr>)          /* load function address */
        mtctr r0                /* move to ctr register */
        st r2,40(r1)            /* save TOC value */
        ld r2,8(<ptr>)          /* load new TOC value */
        ld r11,16(<ptr>)        /* load static chain */
        bctrl                   /* call function */
        ld r2,40(r1)            /* reload our TOC */

The ld of r2 has to wait for the store queue to drain in some cases, because it
is loading a value being stored.


> > I certainly can call the switch -mno-static-chain, which is perhaps more
> > meaningful (at least to us compiler folk, I'm not sure static chain means 
> > much
> > to the normal programmer).
> 
> Well, that's up to the target maintainers to decide, maybe
> -mno-nested-functions instead?

David?

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899

Reply via email to