2010/11/15 Jan Hubicka <hubi...@ucw.cz>: >> For peak, FDO is the most effective option. It can boost performance >> by 7-10% depending on the program. The options you suggested probably >> won't make too big a dent. -funroll-loops can hurt performance >> without profiling. More aggressive inlining, ipa-cp, unswitching etc > > -funroll-loops overall was 2.2% win on SPECint, -funrol-all-loops 2.5% last > time I noted down the SPECint results of this (that was in 2003, heh :) > http://www.ucw.cz/~hubicka/papers/amd64/node4.html > >> enabled by O3 may help a little if there is any. -ffast-math won't >> help for integer benchmarks other than eon. Traditionally, O3 helps >> FP performance because of the loop transformation enabled, but this >> won't be the case for gcc for now. > > Function inlining definitly helps. -O3 also imply vectorization and other > stuff.
Indeed. You can look at the various testers at gcc.opensuse.org which compare -O2 vs. -O3 but also -O3 vs. -O3 -funroll-loops (and other things) to get an idea what helps and what not. Richard. > Honza >> >> Thanks, >> >> David >> >> On Mon, Nov 15, 2010 at 4:29 AM, Andrey Belevantsev <a...@ispras.ru> wrote: >> > Hello, >> > >> > On 14.11.2010 0:08, Xinliang David Li wrote: >> >> >> >> I re-measured the performance difference using trunk gcc and trunk >> >> clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc >> >> because clang/llvm's type based aliasing is not incomplete and not >> >> enabled by default. I also added -fomit-frame-pointer to clang/llvm as >> >> this is gcc's default. The base option is -O2. >> > >> > It would be very interesting to compare also peak numbers, i.e. with LTO >> > and >> > strict aliasing enabled, as well as -O3 and -ffast-math/-funroll-loops, >> > similar to Vlad's or OpenSUSE's options. Can you try to measure these? >> > Maybe you can also run SPEC2k6, if there is enough machine resources, but >> > that's probably asking too much... >> > >> > Andrey >> > >> > >