Am 16.01.2007 um 10:46 schrieb Gustaf Neumann:
s = (size-1) >> 3; while (s>1) { s >>= 1; bucket++;
On Linux and Solaris (both x86 machines) the "long" version: s = (size-1) >> 4; while (s > 0xFF) { s = s >> 5; bucket += 5; } while (s > 0x0F) { s = s >> 4; bucket += 4; } ... is faster then the "short" above. On Mac OSX it is the same (no difference). Look the Sun Solaris 10 (x86 box): (the "short" version) Test Tcl allocator with 4 threads, 16000 records ... This allocator achieves 13753084 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) (the "long" version) -bash-3.00$ ./memtest Test Tcl allocator with 4 threads, 16000 records ... This allocator achieves 14341236 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) That is ((14341236-13753084)/14341236)*100 = 4% On Linux we had about 3% improvement. On Sun about 4% and on Mac OSX none. Note: all were x86 (Intel, AMD) machines just different OS and GHz-count. When we go back to the "slow" (original) version: Test Tcl allocator with 4 threads, 16000 records ... This allocator achieves 13474091 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) We get ((14341236-13474091)/14341236)*100 = 6% improvement. Cheers Zoran