In order to see what difference a different processor makes I also tried the same code on a fairly old 32 bit "AMD Athlon(tm) XP 3000+" with the current stable gcc (4.7.2). The difference is even more striking (dereferencing is much faster). I see that the size of the code inside the loop for the faster pointer access is exactly 8. No idea whether that has any significance.

Here as well I performed several runs with similar results. Statistical significance was established around n=2 ;-).

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.7.2 (Debian 4.7.2-5)

ppeterr@www:~/src/test/obj-vs-ptr$  cat t
#!/bin/bash
cat $1.c && gcc -std=c99 -O0 -g -o $1 $1.c && time ./$1

ppeterr@www:~/src/test/obj-vs-ptr$ ./t obj
int main()
{
    int localInt;
    for (int i = 0; i < 100000000; ++i)
        localInt = i;
    return 0;
}

real    0m0.418s
user    0m0.416s
sys     0m0.004s
ppeterr@www:~/src/test/obj-vs-ptr$ ./t ptr
int main()
{
    int localInt;
    int *localP = &localInt;
    for (int i = 0; i < 100000000; ++i)
        *localP = i;
    return 0;
}

real    0m0.243s
user    0m0.240s
sys     0m0.000s

===============================================================

The disassembly is for the direct access (slower):

        localInt = i;
 80483eb:       8b 45 fc                mov    -0x4(%ebp),%eax
 80483ee:       89 45 f8                mov    %eax,-0x8(%ebp)

And for the pointer access (faster):

        *localP = i;
 80483f1:       8b 45 f8                mov    -0x8(%ebp),%eax
 80483f4:       8b 55 fc                mov    -0x4(%ebp),%edx
 80483f7:       89 10                   mov    %edx,(%eax)

Reply via email to