[Bug middle-end/81441] New: slowdown due to -fpeel-loops and -ftracer added by -fprofile-use

Joost.VandeVondele at mat dot ethz.ch Thu, 13 Jul 2017 23:44:22 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81441


            Bug ID: 81441
           Summary: slowdown due to -fpeel-loops and -ftracer added by
                    -fprofile-use
           Product: gcc
           Version: 5.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Joost.VandeVondele at mat dot ethz.ch
  Target Milestone: ---

For our code, we see a slowdown (3%-7% depending on the user reporting) due to
the options -fpeel-loops and -ftracer added by default when using
-fprofile-use.

The code is stockfish, which is presumably the strongest open source chess
engine, and part of benchmark suites such as
https://openbenchmarking.org/test/pts/stockfish 

The same behaviour has been observed for gcc versions from 4.X to 7.1 so it is
not some recent regression and quite persistent. (Discussions in
https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/YzV_fG7ejR4 and
https://github.com/official-stockfish/Stockfish/pull/1165 )

It is not easy for me to pinpoint the location in the code that is affected
most (despite the code being only ~5000 lines of C++). I tried differential
profiling with perf, but didn't get profiles that made sense to me. 

It is easy to reproduce, by testing two successive git commits where the change
of options in the Makefile is the only difference:

git clone https://github.com/official-stockfish/Stockfish.git
cd Stockfish/src/

# version with -fprofile-use -fno-peel-loops -fno-tracer
# ======================================================
git checkout c8e5384c3a4a5d9ac709c9b50954907a7f07109c
make clean && make -j ARCH=x86-64-modern profile-build
./stockfish bench 128 1 16 default depth 2>&1 | grep 'Total time (ms)'
# (locally reports Total time (ms) : 9947)

#version with just -fprofile-use
#=======================================================
git checkout 0371a8f8c4a043cb3e7d08b5b8e7d08d49f28324
make clean && make -j ARCH=x86-64-modern profile-build
./stockfish bench 128 1 16 default depth 2>&1 | grep 'Total time (ms)'
# (locally reports Total time (ms) : 10456)

So '-fprofile-use -fno-peel-loops -fno-tracer' is 5% faster than
'-fprofile-use' in my case.

Let me know if I can provide more info. The length of the benchmarks can be
adjusted easily by changing the '16' in the bench command to smaller (shorter)
or larger (longer) numbers (time increases/decreases exponentially, change in
steps of 1 to have ~2x change).

[Bug middle-end/81441] New: slowdown due to -fpeel-loops and -ftracer added by -fprofile-use

Reply via email to