epilogue for modern CPUs

Xinliang David Li Tue, 11 Dec 2012 14:53:25 -0800

The following the O2 size data from SPEC2k.  Note that with push/pop,
it is a always a net win (negative delta) in terms of total binary or
total loadable section size.


thanks,

David

                   .text    .eh_frame  Total_binary
vortex-move 440252 40796 584066
vortex-push 415436 57452 575906
delta            -5.6% 40.8%  -1.397%

twolf-move 169324 10748 223521
twolf-push 168876 11124 223449
delta       -0.3% 3.5% -0.032%

gzip-move 30668 3652 374399
gzip-push 30524 3740 374343
delta     -0.5% 2.4% -0.015%

bzip2-move 22748 3196 111616
bzip2-push 22636 3284 111592
delta      -0.5% 2.8% -0.022%

vpr-move 104684 9380 147378
vpr-push 104236 9788 147338
delta     -0.4% 4.3% -0.027%

mcf-move 8444 1244 26760
mcf-push 8444 1244 26760
delta    0.0% 0.0% 0.000%

cc1-move 1093964 90772 1576994
cc1-push 1078988 104068 1575314
delta      -1.4% 14.6% -0.107%

crafty-move 130556 5508 1256037
crafty-push 130236 5772 1255981
delta        -0.2% 4.8% -0.004%

eon-move 333660 33220 516491
eon-push 330140 35812 515555
delta     -1.1% 7.8% -0.181%

gap-move 404092 46732 1457735
gap-push 396012 53180 1456103
delta     -2.0% 13.8% -0.112%

perlbmk-move 456572 45324 618585
perlbmk-push 449516 52340 618545
delta         -1.5% 15.5% -0.006%

parser-move 81244 15788 334003
parser-push 80684 16332 333987
delta       -0.7% 3.4% -0.005%


On Tue, Dec 11, 2012 at 9:14 AM, Xinliang David Li <[email protected]> wrote:
> On Tue, Dec 11, 2012 at 1:49 AM, Richard Biener
> <[email protected]> wrote:
>> On Mon, Dec 10, 2012 at 10:07 PM, Mike Stump <[email protected]> wrote:
>>> On Dec 10, 2012, at 12:42 PM, Xinliang David Li <[email protected]> wrote:
>>>> I have not measured the CFI size impact -- but conceivably it should
>>>> be larger -- which is unfortunate.
>>>
>>> Code speed and size are preferable to optimizing dwarf size…  :-)  I'd let 
>>> dwarf 5 fix it!
>>
>> Well, different to debug info, CFI data has to be in memory to make
>> unwinding work.
>> These days most Linux distributions enable asyncronous unwind tables so any
>> size savings due to shorter push/pop epilogue/prologue sequences has to be
>> offsetted by the increase in CFI data.  I'm not sure there is really a
>> speed difference
>> between both variants (well, maybe due to better icache footprint of
>> the push/pop
>> variant).
>
> Yes, for large applications, this can be crucial to performance.
>
>>
>> That said - I'd prefer to have more data on this before making the switch for
>> the generic model.  What was your original motivation?  Just "theory" or was
>> it a real case?
>
> 1) some of the very large internal apps I measured benefit from this
> change (in terms of performance)
> 2) both ICC and LLVM do the same.
>
> I have already committed the patch. I will find some time to collect
> more size data and post it later.
>
> thanks,
>
> David
>
>
>>
>> Thanks,
>> Richard.

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

Reply via email to