Hi,

at first - my congratulations to all, who were involved in finding the new Mersenne prime number! Well done! And besides the fact to have found the currently largest known prime number, it also gives a nice boost to GIMPS' progress thanks to the popularity.

Back to the topic:
Some time ago there was a discussion going on regarding the use of large memory pages. In a mersenneforum thread I collected some info regarding new linux kernels and some real world results published in a paper.


Here some extracts:
Linux kernel versions 2.5.36+ and 2.6 include a "HugeTLBs" patch, which allows an application to allocate large memory pages.
Also 64bit Windows Server seems to support them too:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/large_page_support.asp


My thoughts about the possilibities:
Oracle managed to get a 8% speedup by using the large pages. Although I have little experience in this area I think for FFTs the speedup will be much larger, because:


* Even if data is already in the L1 cache the accessing time can increase if the memory addresses of these data are actually spread over many memory pages.
* The limited amount of TLB entries requires fine tuning of FFT algorithms to avoid TLB thrashing as much as possible - but this avoidance could cause less efficient algorithms.
* why is it so hard for large size FFTs to come at least close to the FFT MFLOPS for FFTs running completely inside L1 (or L2) cache in times of memory prefetching?
* I need at least 2 mem-read/write passes to do a large size FFT - but todays max transfer rates for P4/Opteron/AFX systems (6.4GB/s = reading up to 750 times the 1024k FFT data set per second) is hardly reachable because it drops significantly for large strides.


I roughly estimate that at least a speedup of 10-30% could be possible.

A paper analyzed the effect of larger pages by implementing it in FreeBSD.

It can be found here: http://www.cs.rice.edu/~jnavarro/superpages/

Some results:

SPEC benchmark Speedup by using superpages

vpr                      38.3%
mcf                      67.6%
vortex                   11.2%
bzip2                    14.0%
average for SPECint      11.2%

galgel                   28.9%
art                      12.2%
lucas                    28.0%
apsi                     82.7%
average for SPECfp       11.0%

and some non-SPEC benchmarks:
FFTW                     54.9%
Matrix                  654.6%

Regards,
Matthias Waldhauer

PS: I sent this mail before with wrong sender address. If someone got it, please understand this mail as an edited version ;)

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to