On Tue, Mar 19, 2002 at 08:11:54PM +0000, Brian J. Beesley wrote: >The speed it's running at suggests >that any performance loss due to TLB thrashing is small, since the extra drop >beyond linearity is only about what one would expect from the LL test >algorithm being O(n log n).
Disclaimer: My argument below might not be a very valid argument. ;-) Paste from gwnum.c, Prime95 v19: /* Well.... I implemented the above only to discover I had dreadful */ /* performance in pass 1. How can that be? The problem is that each */ /* cache line in pass 1 comes from a different 4KB page. Therefore, */ /* pass 1 accessed 128 different pages. This is a problem because the */ /* Pentium chip has only 64 TLBs (translation lookaside buffers) to map */ /* logical page addresses into physical addresses. So we need to shuffle */ /* the data further so that pass 1 data is on fewer pages while */ /* pass 2 data is spread over more pages. */ So, it might be that due to TLB thrashing, George would have to choose a less efficient memory layout to avoid them, and thus get lower speed overall. No such note in v20, though :-) /* Steinar */ -- Homepage: http://www.sesse.net/ _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers