On Tuesday 19 March 2002 10:09, Nick Craig-Wood wrote:
> On Mon, Mar 18, 2002 at 02:12:48PM +0000, Brian J. Beesley wrote:
> >
> > If the active data is already memory resident, TLB thrashing is not going
> > to be an issue.
>
> The TLB (translation lookaside buffer) has very little to do with the
> Virtual Memory system.  The TLB is used by the processor to cache the
> address translations from logical memory to physical memory.  These
> have to be read from the page table RAM which is expensive - hence the
> cache.

Ah, but ... frequently accessing pages (virtual _or_ physical) will keep the 
TLB pages from getting too far away from the processor; probably at worst 
they will stay in the L1 cache.

The overhead of accessing from L1 cache is small compared with the overhead 
of accessing data from main memory, and _tiny_ compared with the overhead of 
accessing data from the page/swap file.
>
> When I was working on a DWT implementation for StrongARM I found that
> thrashing the TLB caused a factor of two slow down.  The StrongARM
> system I was using had no virtual memory.
>
> If mprime is using 10 MB of memory say, then each page needs 1 TLB
> entry to be used at all by the processor - ie 2560 TLB entries which
> is way bigger than the size of the TLB in the processor (I don't
> remember what it is in x86 but on StrongARM it has 32 entries).  To
> access each physical page the TLB has to be reloaded from the page
> tables which is an extra memory access or two.  If you use 2 MB pages
> then there are only 5 pages needed and hence the TLB will never need
> to be refilled and hence some speed gain.

Don't you _need_ to have at least enough TLB entries to map the whole of the 
processor cache? (Since without it you can't map the cache table entries...) 
The K7 (Athlon) architecture is designed to support at least 8MBytes cache, 
even though AFAIK no Athlons with more than 512KB cache have been supplied. 
Intel have supplied Xeons with 2MBytes cache; I can't remember offhand what 
the design limit is...

Anyway, here's the point. I'm running mprime on an Athlon (XP1700) with a 
very large exponent (~67 million); the virtual memory used by the mprime 
process is 42912 Kbytes = 10,000+ pages. The speed it's running at suggests 
that any performance loss due to TLB thrashing is small, since the extra drop 
beyond linearity is only about what one would expect from the LL test 
algorithm being O(n log n).

Whatever effect TLB thrashing may or may not be having, it doesn't look as 
though it's having a dominant effect on mprime.

> I think this would make a real difference to mprime - what percentage
> I don't know - at the cost of on average 1 MB of RAM extra.

I wouldn't mind _doubling_ the memory footprint, if we got a _significant _
performance boost as a consequence.

BTW why does this argument apply only to mprime? Surely Windows has the same 
underlying architecture - though obviously it's harder to get the Windows 
kernel changed than linux. 

Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to