On Wed, Dec 10, 2003 at 06:26:54PM +0000, Nick Craig-Wood wrote:
> I'm in the process of writing (not quite finished or working ;-) some
> code which you load as an LD_PRELOAD library under linux.  This gets
> its fingers into the memory allocation, and makes all malloc space
> come from hugetlbfs (how you get large pages under linux).
> 
> My primary user for this was to be mprime of course!

Well I finished the code and here are the results on my lowly laptop
running 2.6.0.

Intel(R) Pentium(R) III processor
CPU speed: 550.78 MHz
CPU features: RDTSC, CMOV, PREFETCH, MMX, SSE
L1 cache size: 16 KB
L2 cache size: 256 KB
L1 cache line size: 32 bytes
L2 cache line size: 32 bytes
TLBS: 64
Prime95 version 22.12, RdtscTiming=1

Normal
------

Best time for 256K FFT length: 80.256 ms.
Best time for 320K FFT length: 101.820 ms.
Best time for 384K FFT length: 125.191 ms.
Best time for 448K FFT length: 145.505 ms.
Best time for 512K FFT length: 161.178 ms.
Best time for 640K FFT length: 215.113 ms.
Best time for 768K FFT length: 258.055 ms.
Best time for 896K FFT length: 304.786 ms.
Best time for 1024K FFT length: 345.747 ms.
Best time for 1280K FFT length: 449.540 ms.
Best time for 1536K FFT length: 541.963 ms.
Best time for 1792K FFT length: 661.651 ms.

With all memory allocations coming from 4 MB pages
--------------------------------------------------

Best time for 256K FFT length: 79.293 ms.   1.2%
Best time for 320K FFT length: 102.032 ms. -0.2%
Best time for 384K FFT length: 124.022 ms.  0.9%
Best time for 448K FFT length: 145.492 ms.  0.0%
Best time for 512K FFT length: 161.568 ms. -0.2%
Best time for 640K FFT length: 213.311 ms.  0.8%
Best time for 768K FFT length: 254.609 ms.  1.3%
Best time for 896K FFT length: 301.911 ms.  0.9%
Best time for 1024K FFT length: 339.203 ms. 1.9%
Best time for 1280K FFT length: 439.119 ms. 2.3%
Best time for 1536K FFT length: 531.422 ms. 1.9%
Best time for 1792K FFT length: 645.350 ms. 2.5%

So consistent but small improvements in the larger FFTs.  This just
goes to show what a good job George has done in not thrashing the TLB!

I wonder if Prime95 could be made more efficient if it didn't have to
worry about the TLB?  Its obviously detecting the TLB slots for this
computer which is wrong in this case - perhaps there is a way of
overriding this?

Please email me if you'd like to experiment with the code - its quite
simple (it just took rather a lot of different approaches to get
right!).  You'll need to be running 2.6.0 with HUGETLB support if you
want to play (see hugetlbpage.txt in Documentation in the kernel
source for more info).

-- 
Nick Craig-Wood
[EMAIL PROTECTED]
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to