Hello, I completely agree with David. Note that your results will greatly vary depending on the machine you run the tests on. Performance on such tests it is very machine-dependant, so the conclusion cannot be generalized.
David 2014-04-16 16:49 GMT+02:00 David Brown <da...@westcontrol.com>: > > Hi, > > You cannot learn useful timing information from a single run of a short > test like this - there are far too many other factors that come into play. > > You cannot learn useful timing information from unoptimised code. > > There is too much luck involved in a test like this to be useful. You > need optimised code (at least -O1), longer times, more tests, varied > code, etc., before being able to conclude anything. Otherwise the > result could be nothing more than a quirk of the way caching worked out. > > mvh., > > David > > > On 16/04/14 16:26, Peter Schneider wrote: >> I have made a curious performance observation with gcc under 64 bit >> cygwin on a corei7. I'm genuinely puzzled and couldn't find any >> information about it. Perhaps this is only indirectly a gcc question >> though, bear with me. >> >> I have two trivial programs which assign a loop variable to a local >> variable 10^8 times. One does it the obvious way, the other one accesses >> the variable through a pointer, which means it must dereference the >> pointer first. This is reflected nicely in the disassembly snippets of >> the respective loop bodies below. Funny enough, the loop with the extra >> dereferencing runs considerably faster than the loop with the direct >> assignment (>10%). While the issue (indeed the whole program ;-) ) goes >> away with optimization, in less trivial scenarios that may not be so. >> >> My first question is: What makes the smaller code slower? >> The gcc question is: Should assignment always be performed through a >> pointer if it is faster? (Probably not, but why not?) A session >> transcript including the compilable source is below. >> >> Here are the disassembled loop bodies: >> >> Direct access >> ===================================================== >> localInt = i; >> 1004010e6: 8b 45 fc mov -0x4(%rbp),%eax >> 1004010e9: 89 45 f8 mov %eax,-0x8(%rbp) >> >> >> Pointer access >> ===================================================== >> *localP = i; >> 1004010ee: 48 8b 45 f0 mov -0x10(%rbp),%rax >> 1004010f2: 8b 55 fc mov -0x4(%rbp),%edx >> 1004010f5: 89 10 mov %edx,(%rax) >> >> Note the first instruction which moves the address into %rax. The other >> two are similar to the direct assignment above.-- >> >> Here is a session transcript: >> >> $ gcc -v >> Using built-in specs. >> COLLECT_GCC=gcc >> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe >> Target: x86_64-pc-cygwin >> Configured with: >> /cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure >> --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2 >> --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin >> --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var >> --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share >> --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C >> --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin >> --target=x86_64-pc-cygwin --without-libiconv-prefix >> --without-libintl-prefix --enable-shared --enable-shared-libgcc >> --enable-static --enable-version-specific-runtime-libs >> --enable-bootstrap --disable-__cxa_atexit --with-dwarf2 >> --with-tune=generic >> --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite >> --enable-threads=posix --enable-libatomic --enable-libgomp >> --disable-libitm --enable-libquadmath --enable-libquadmath-support >> --enable-libssp --enable-libada --enable-libgcj-sublibs >> --disable-java-awt --disable-symvers >> --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as >> --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix >> --without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib >> Thread model: posix >> gcc version 4.8.2 (GCC) >> >> peter@peter-lap ~/src/test/obj_vs_ptr >> $ cat ./t >> #!/bin/bash >> >> cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1 >> >> >> peter@peter-lap ~/src/test/obj_vs_ptr >> $ ./t obj >> int main() >> { >> int localInt; >> for (int i = 0; i < 100000000; ++i) >> localInt = i; >> return 0; >> } >> real 0m0.248s >> user 0m0.234s >> sys 0m0.015s >> >> peter@peter-lap ~/src/test/obj_vs_ptr >> $ ./t ptr >> int main() >> { >> int localInt; >> int *localP = &localInt; >> for (int i = 0; i < 100000000; ++i) >> *localP = i; >> return 0; >> } >> >> real 0m0.215s >> user 0m0.203s >> sys 0m0.000s >> >> >