> So I thing we ought to honnor accumulate-outgoing-args again and in fact > consider disabling it for generic - it is disabled for core (that may need > re-benchmarking). For all AMD targets it is currently on. I tested disabling > it on buldozer 32bit and it seems mostly SPEC neutral for specint2000 (I am > wating for more benchmarks) with very nice code size improvements in all > benchmarks with exception of MCF with LTO (not sure at all why), with overall > reduction of 5.2% (same gain as we get for -flto aproximately) > http://gcc.opensuse.org/SPEC/CINT/sb-megrez-head-64-32o-32bit/size.html Specint 2000/2006 seems quite good. (though for spec2k6 the differences in sizes are no longer anywhere near to -flto code size reductions)
Unfortunately there is 40% regression on mgrid with -flto (and also noticeable regression without LTO). First thing I noticed is that we stop omitting frame pointer in the hottest function. This is because we see: (set (reg/f:SI 7 sp) (plus:SI (reg/f:SI 7 sp) (const_int -8 [0xfffffffffffffff8]))) and we end up marking SP as as uneliminable in: /* See if this is setting the replacement hard register for an elimination. If DEST is the hard frame pointer, we do nothing because we assume that all assignments to the frame pointer are for non-local gotos and are being done at a time when they are valid and do not disturb anything else. Some machines want to eliminate a fake argument pointer (or even a fake frame pointer) with either the real frame pointer or the stack pointer. Assignments to the hard frame pointer must not prevent this elimination. */ for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (ep->to_rtx == SET_DEST (x) && SET_DEST (x) != hard_frame_pointer_rtx && (! (SUPPORTS_STACK_ALIGNMENT && stack_realign_fp && REGNO (ep->to_rtx) == STACK_POINTER_REGNUM) || GET_CODE (SET_SRC (x)) != PLUS || XEXP (SET_SRC (x), 0) != SET_DEST (x) || ! CONST_INT_P (XEXP (SET_SRC (x), 1)))) setup_can_eliminate (ep, false); It is because of && (! (SUPPORTS_STACK_ALIGNMENT && stack_realign_fp && REGNO (ep->to_rtx) == STACK_POINTER_REGNUM) I am somewhat confused why do we need to stop eliminating. Function is not marked as needing drap (and in that case stack_realign_fp would be true) What is this conditional shooting for? Mgrid is somewhat sensitive to register pressure, so this actually may explain the 40% difference... Honza