Am 16.01.2007 um 10:46 schrieb Gustaf Neumann:

s = (size-1) >> 3;
      while (s>1) { s >>= 1; bucket++;

On Linux and Solaris (both x86 machines)
the "long" version:

    s = (size-1) >> 4;
    while (s > 0xFF) {
        s = s >> 5;
        bucket += 5;
    }
    while (s > 0x0F) {
        s = s >> 4;
        bucket += 4;
    }
    ...

is faster then the "short" above.
On Mac OSX it is the same (no difference).

Look the Sun Solaris 10 (x86 box):

(the "short" version)
Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 13753084 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)


(the "long" version)
-bash-3.00$ ./memtest

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 14341236 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)

That is ((14341236-13753084)/14341236)*100 = 4%

On Linux we had about 3% improvement. On Sun about 4% and
on Mac OSX none. Note: all were x86 (Intel, AMD) machines
just different OS and GHz-count.

When we go back to the "slow" (original) version:

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 13474091 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)

We get ((14341236-13474091)/14341236)*100 = 6% improvement.

Cheers
Zoran




Reply via email to