Hi, You cannot learn useful timing information from a single run of a short test like this - there are far too many other factors that come into play.
You cannot learn useful timing information from unoptimised code. There is too much luck involved in a test like this to be useful. You need optimised code (at least -O1), longer times, more tests, varied code, etc., before being able to conclude anything. Otherwise the result could be nothing more than a quirk of the way caching worked out. mvh., David On 16/04/14 16:26, Peter Schneider wrote: > I have made a curious performance observation with gcc under 64 bit > cygwin on a corei7. I'm genuinely puzzled and couldn't find any > information about it. Perhaps this is only indirectly a gcc question > though, bear with me. > > I have two trivial programs which assign a loop variable to a local > variable 10^8 times. One does it the obvious way, the other one accesses > the variable through a pointer, which means it must dereference the > pointer first. This is reflected nicely in the disassembly snippets of > the respective loop bodies below. Funny enough, the loop with the extra > dereferencing runs considerably faster than the loop with the direct > assignment (>10%). While the issue (indeed the whole program ;-) ) goes > away with optimization, in less trivial scenarios that may not be so. > > My first question is: What makes the smaller code slower? > The gcc question is: Should assignment always be performed through a > pointer if it is faster? (Probably not, but why not?) A session > transcript including the compilable source is below. > > Here are the disassembled loop bodies: > > Direct access > ===================================================== > localInt = i; > 1004010e6: 8b 45 fc mov -0x4(%rbp),%eax > 1004010e9: 89 45 f8 mov %eax,-0x8(%rbp) > > > Pointer access > ===================================================== > *localP = i; > 1004010ee: 48 8b 45 f0 mov -0x10(%rbp),%rax > 1004010f2: 8b 55 fc mov -0x4(%rbp),%edx > 1004010f5: 89 10 mov %edx,(%rax) > > Note the first instruction which moves the address into %rax. The other > two are similar to the direct assignment above.-- > > Here is a session transcript: > > $ gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-cygwin/4.8.2/lto-wrapper.exe > Target: x86_64-pc-cygwin > Configured with: > /cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2/configure > --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/gcc/gcc-4.8.2-3/src/gcc-4.8.2 > --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin > --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var > --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share > --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C > --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin > --target=x86_64-pc-cygwin --without-libiconv-prefix > --without-libintl-prefix --enable-shared --enable-shared-libgcc > --enable-static --enable-version-specific-runtime-libs > --enable-bootstrap --disable-__cxa_atexit --with-dwarf2 > --with-tune=generic > --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++ --enable-graphite > --enable-threads=posix --enable-libatomic --enable-libgomp > --disable-libitm --enable-libquadmath --enable-libquadmath-support > --enable-libssp --enable-libada --enable-libgcj-sublibs > --disable-java-awt --disable-symvers > --with-ecj-jar=/usr/share/java/ecj.jar --with-gnu-ld --with-gnu-as > --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix > --without-libintl-prefix --with-system-zlib --libexecdir=/usr/lib > Thread model: posix > gcc version 4.8.2 (GCC) > > peter@peter-lap ~/src/test/obj_vs_ptr > $ cat ./t > #!/bin/bash > > cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1 > > > peter@peter-lap ~/src/test/obj_vs_ptr > $ ./t obj > int main() > { > int localInt; > for (int i = 0; i < 100000000; ++i) > localInt = i; > return 0; > } > real 0m0.248s > user 0m0.234s > sys 0m0.015s > > peter@peter-lap ~/src/test/obj_vs_ptr > $ ./t ptr > int main() > { > int localInt; > int *localP = &localInt; > for (int i = 0; i < 100000000; ++i) > *localP = i; > return 0; > } > > real 0m0.215s > user 0m0.203s > sys 0m0.000s > >