On April 16, 2014 7:45:55 PM CEST, Peter Schneider <schneid...@gmx.net> wrote: >In order to see what difference a different processor makes I also >tried >the same code on a fairly old 32 bit "AMD Athlon(tm) XP 3000+" with the > >current stable gcc (4.7.2). The difference is even more striking >(dereferencing is much faster). I see that the size of the code inside >the loop for the faster pointer access is exactly 8. No idea whether >that has any significance.
Alignment of jump targets are important. I don't think we do anything special there at O0, so the result will be pure luck. Richard. >Here as well I performed several runs with similar results. Statistical > >significance was established around n=2 ;-). > >gcc -v >Using built-in specs. >COLLECT_GCC=gcc >COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper >Target: i486-linux-gnu >Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' > >--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs >--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr >--program-suffix=-4.7 --enable-shared --enable-linker-build-id >--with-system-zlib --libexecdir=/usr/lib --without-included-gettext >--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 >--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu >--enable-libstdcxx-debug --enable-libstdcxx-time=yes >--enable-gnu-unique-object --enable-plugin --enable-objc-gc >--enable-targets=all --with-arch-32=i586 --with-tune=generic >--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu >--target=i486-linux-gnu >Thread model: posix >gcc version 4.7.2 (Debian 4.7.2-5) > >ppeterr@www:~/src/test/obj-vs-ptr$ cat t >#!/bin/bash >cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1 > >ppeterr@www:~/src/test/obj-vs-ptr$ ./t obj >int main() >{ > int localInt; > for (int i = 0; i < 100000000; ++i) > localInt = i; > return 0; >} > >real 0m0.418s >user 0m0.416s >sys 0m0.004s >ppeterr@www:~/src/test/obj-vs-ptr$ ./t ptr >int main() >{ > int localInt; > int *localP = &localInt; > for (int i = 0; i < 100000000; ++i) > *localP = i; > return 0; >} > >real 0m0.243s >user 0m0.240s >sys 0m0.000s > >=============================================================== > >The disassembly is for the direct access (slower): > > localInt = i; > 80483eb: 8b 45 fc mov -0x4(%ebp),%eax > 80483ee: 89 45 f8 mov %eax,-0x8(%ebp) > >And for the pointer access (faster): > > *localP = i; > 80483f1: 8b 45 f8 mov -0x8(%ebp),%eax > 80483f4: 8b 55 fc mov -0x4(%ebp),%edx > 80483f7: 89 10 mov %edx,(%eax)