On Tue, Mar 19, 2002 at 08:11:54PM +0000, Brian J. Beesley wrote:
>The speed it's running at suggests 
>that any performance loss due to TLB thrashing is small, since the extra drop 
>beyond linearity is only about what one would expect from the LL test 
>algorithm being O(n log n).

Disclaimer: My argument below might not be a very valid argument. ;-)

Paste from gwnum.c, Prime95 v19:

/* Well.... I implemented the above only to discover I had dreadful */
/* performance in pass 1.  How can that be?  The problem is that each  */
/* cache line in pass 1 comes from a different 4KB page.  Therefore, */
/* pass 1 accessed 128 different pages.  This is a problem because the */
/* Pentium chip has only 64 TLBs (translation lookaside buffers) to map */
/* logical page addresses into physical addresses.  So we need to shuffle */
/* the data further so that pass 1 data is on fewer pages while */
/* pass 2 data is spread over more pages. */

So, it might be that due to TLB thrashing, George would have to choose a less
efficient memory layout to avoid them, and thus get lower speed overall.

No such note in v20, though :-)

/* Steinar */
Homepage: http://www.sesse.net/
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to