Hi:

"Brian J. Beesley" wrote:
> Well, without the tunefftw program, I did manage to make MacLucasFFTW
> on my Alpha system & it does pass the self-tests. However, the
> performance is not particularly brilliant. For a complete test of
> exponent 11213 using MacLucasUNIX (FFT run length 1024) the CPU time
> is 3.45 seconds, but MacLucasFFTW takes 4.53 seconds CPU using a FFT
> run length of 640. (This is an Alpha 21164PC at 533 MHz with 2 MB L3
> cache and 256 MB SDRAM, running Red Hat Linux 5.1)
>
 
Well, I'm wonder the reason of such diferent performance. On intel
machines MacLucasFFTW runs more than twice faster than MacLucas and on
RISC processor MacLucasUNIX is better than MacLucasFFTW. Looking at the
code, without deep understanding, one can see:

i) MacLucasUNIX uses intensively the 'register' key in local
definitions, so a processor with many internal registers can allocate
most of them. It is a good thing because they can be accessed very fast.
The bad thing that is that in processors with very few registers (like
intel's) it can slowdown the speed. 

ii) On the other hand, FFTW does not use 'register' at all. All local
variables are stored on stack. I don't know much about compilers, but
perhaps some good compilers can use the register storage as speed
optimization. Looking at the code generated by gnu-gcc on intel
processors, some local double variables are stored on intel fpu and the
performance is so good. 

My question is: What can happen in FFTW code if we directly include
'register' keys management on its local temporal variable definitions?.
This sort of things can be made with a single compiler option?. 

I did it. I've included register managements on all FFTW radix routines
up to radix-16 (which need no more than 32 stack variables). For intel
machines the code is untouched (because I previously defined REG as a
void comment) . But I'm not the owner of a RISC machine so I have no
idea about its performance. any volunteer?.


> The other line of approach I have on improving MacLucasUNIX is to try
> Digital's native C compiler - the linux beta is currently available
> FOC, but unfortunately I will have to upgrade linux to run it as it
> requires 5.2 or later. (I think the version of libc is the critical
> factor.) The principle being that, when it comes to squeezing
> performance out of an Alpha CPU, the people who developed the Alpha
> architecture may well do a better job than the people who develop
> gcc.
> 
Any improvement on MacLucasXXXX is desired.

I think we can sqeeze FFTW a lot more. I like its code very much. The
good performance on intel (45% with respect mprime) is good enought to
work a litle more on it. 

Regards

| Guillermo Ballester Valor       |  
| [EMAIL PROTECTED]                      |  
| c/ cordoba, 19                  |
| 18151-Ogijares (Spain)          |
| (Linux registered user 1171811) |
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to