http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57315
Bug ID: 57315
Summary: LTO and/or vectorizer performance regression on
salsa20 core, 4.7->4.8
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zackw at panix dot com
I'm seeing a significant performance regression from 4.7 to 4.8 (targeting
x86-64) on the "salsa20" core function (this is a stream cipher). Repro
instructions:
$ git clone git://github.com/zackw/rngstats.git
# ...
$ make -s cipher-test CC=gcc-4.7 && ./cipher-test >&/dev/null && ./cipher-test
KAT: aes128... ok
KAT: aes256... ok
KAT: arc4... ok
KAT: isaac64... ok
KAT: salsa20_128... ok
KAT: salsa20_256... ok
TIME: aes128... 2000 keys, 3.47834s -> 574.987 keys/s
TIME: aes256... 2000 keys, 3.62452s -> 551.797 keys/s
TIME: arc4... 2000 keys, 2.21746s -> 901.933 keys/s
TIME: isaac64... 2000 keys, 2.03467s -> 982.962 keys/s
TIME: salsa20_128... 2000 keys, 2.31960s -> 862.217 keys/s
TIME: salsa20_256... 2000 keys, 2.31932s -> 862.320 keys/s
$ make -s clean cipher-test CC=gcc-4.8 && ./cipher-test >&/dev/null &&
./cipher-test
KAT: aes128... ok
KAT: aes256... ok
KAT: arc4... ok
KAT: isaac64... ok
KAT: salsa20_128... ok
KAT: salsa20_256... ok
TIME: aes128... 2000 keys, 2.49224s -> 802.491 keys/s
TIME: aes256... 2000 keys, 3.62372s -> 551.919 keys/s
TIME: arc4... 2000 keys, 2.22794s -> 897.689 keys/s
TIME: isaac64... 2000 keys, 2.05087s -> 975.194 keys/s
TIME: salsa20_128... 2000 keys, 3.53085s -> 566.436 keys/s
TIME: salsa20_256... 2000 keys, 2.53003s -> 790.505 keys/s
The regression shows in the last two TIME: lines for each build. The relevant
code is probably in ciphers/salsa20.c, or else in worker.c.
Note that there are other programs in this repository, and they require unusual
libraries to build. I recommend you do not attempt a "make all", and if you
get errors, try commenting out the CFLAGS.mpi and LIBS.mpi lines in the
Makefile.