http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57315

--- Comment #2 from Zack Weinberg <zackw at panix dot com> ---
Created attachment 30210
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30210&action=edit
self-contained test case

Here's a self-contained test case.

$ gcc-4.7 -std=c99 -O2 -march=native salsa20-regr.c && ./a.out
 875.178 keys/s
$ gcc-4.8 -std=c99 -O2 -march=native salsa20-regr.c && ./a.out
 808.869 keys/s

$ gcc-4.7 -std=c99 -O3 -march=native salsa20-regr.c && ./a.out
 867.879 keys/s
$ gcc-4.8 -std=c99 -O3 -march=native salsa20-regr.c && ./a.out
 800.794 keys/s

$ gcc-4.7 -std=c99 -O3 -fwhole-program -march=native salsa20-regr.c && ./a.out 
 606.605 keys/s
$ gcc-4.8 -std=c99 -O3 -fwhole-program -march=native salsa20-regr.c && ./a.out 
 571.935 keys/s

These numbers are stable to within about 1 key/s.  So there's a 6-8% regression
from 4.7 to 4.8 regardless of optimization level, but also -O3 and -O3
-fwhole-program are inferior to -O2 for this program, with both compilers. 
(-O2 -fwhole-program is within noise of just -O2 for both.)

With 4.8, -march=native on my computer expands to

-march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm
-mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx
-mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c
-mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt
--param l1-cache-size=0 --param l1-cache-line-size=0 --param l2-cache-size=256
-mtune=corei7-avx

Reply via email to