Hi:

[EMAIL PROTECTED] wrote:
> 
> Guillermo Ballester Valor writes:
> 
> << Do you Know the GNU-FFTW package ?(The Fastest Fourier Transform in the
> West). Last week I thought it would be interesting to see if it is as
> fast as they say. It is really a fast C-written FFT code. >>
> 
....
> 
> My comment is this: If you want to create a fast FFT-based program in a
> short time, FFTW is certainly a good way to do it. On the other hand, if
> you're really striving for extreme performance (i.e. trying to write a
> code that possibly many people will use intensively), it is best to have
> a code whose details you understand well. In my case, the latter criterion
> (and the fact that I wanted to learn as much as I could about FFTs) led me
> to write my own code. I started with the Numerical Recipes FFT (slow and
> not very accurate, but easy to play with) about 3 years ago and have been
> working on it ever since - my current code looks nothing like the NR FFT,
> but you have to start somewhere.
> 
Yes, certainly I've be able to adapt lucdwt and McLucasUNIX in four
days. On the other hand, my goal only was to know if working with FFTW
is a good idea, and timings obtained make me think it could be.

> Looking at the FFTW timings page, for a length-262144 real-vector transform
> they list (http://www.fftw.org/benchfft/results/alpha467.html)
> a performance of around  105 "MFlops" on a 467 MHz Alpha 21164. Using
> their definition of MFlops for real FFTs, this translates to a per-FFT time of
> 
> 0.5*[5*262144*log2(262144) Flop]/[115 MFlop/sec] = 0.112 sec.
> 
> My LL code does 2 FFT's plus other operations per Mersenne-mod squaring,
> so we estimate about 80% of the per-iteration time equals one FFT. At 256K
> vector length it needs .177 sec per iteration on a 400 MHz 21164, which
> leads to an estimate of .40*.177*400/467 = 0.061 sec on a 467MHz 21164
> which is significantly faster than FFTW.
> 
If your comparison were ported to intel machines, which is wrong, your
code will run nearly as fast as mprime!!. You say your code is twice
faster than FFTW, sure it is, *BUT* in my pentium-166 the short code I
wrote do an iteration of my actual exponent 3975659 in 0.901 seconds
while mprime take only 0.359. This makes a RPI=40%. Then, your code will
reach nearly 90% !and without lots of assembler code!.

Is there any linux or window Mlucas 2.7 executable for intel machines?
It would be nice to look at timings. 


P.S. The source I sent to E. Mayer was buggy, it was an early version I
sent him :-(, if somebody want to the source, mail me in private. 


| Guillermo Ballester Valor       |  
| [EMAIL PROTECTED]                      |  
| c/ cordoba, 19                  |
| 18151-Ogijares (Spain)          |
| (Linux registered user 1171811) |
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to