at first - my congratulations to all, who were involved in finding the new Mersenne prime number! Well done! And besides the fact to have found the currently largest known prime number, it also gives a nice boost to GIMPS' progress thanks to the popularity.
Back to the topic:
Some time ago there was a discussion going on regarding the use of large memory pages. In a mersenneforum thread I collected some info regarding new linux kernels and some real world results published in a paper.
Here some extracts:
Linux kernel versions 2.5.36+ and 2.6 include a "HugeTLBs" patch, which allows an application to allocate large memory pages.
Also 64bit Windows Server seems to support them too:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/large_page_support.asp
My thoughts about the possilibities:
Oracle managed to get a 8% speedup by using the large pages. Although I have little experience in this area I think for FFTs the speedup will be much larger, because:
* Even if data is already in the L1 cache the accessing time can increase if the memory addresses of these data are actually spread over many memory pages.
* The limited amount of TLB entries requires fine tuning of FFT algorithms to avoid TLB thrashing as much as possible - but this avoidance could cause less efficient algorithms.
* why is it so hard for large size FFTs to come at least close to the FFT MFLOPS for FFTs running completely inside L1 (or L2) cache in times of memory prefetching?
* I need at least 2 mem-read/write passes to do a large size FFT - but todays max transfer rates for P4/Opteron/AFX systems (6.4GB/s = reading up to 750 times the 1024k FFT data set per second) is hardly reachable because it drops significantly for large strides.
I roughly estimate that at least a speedup of 10-30% could be possible.
A paper analyzed the effect of larger pages by implementing it in FreeBSD.
It can be found here: http://www.cs.rice.edu/~jnavarro/superpages/
Some results:
SPEC benchmark Speedup by using superpages
vpr 38.3% mcf 67.6% vortex 11.2% bzip2 14.0% average for SPECint 11.2%
galgel 28.9% art 12.2% lucas 28.0% apsi 82.7% average for SPECfp 11.0%
and some non-SPEC benchmarks: FFTW 54.9% Matrix 654.6%
Regards, Matthias Waldhauer
PS: I sent this mail before with wrong sender address. If someone got it, please understand this mail as an edited version ;)
_________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers