Mersenne Digest       Monday, December 13 1999       Volume 01 : Number 670




----------------------------------------------------------------------

Date: Thu, 9 Dec 1999 22:40:32 -0000
From: "McMac" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Mersenne Digest V1 #669

> How quickly can M(110503) be tested now-a-days on the fastest
> machines?
> 
> - --Luke

Erm, not sure about fastest but my PII-450 took under four minutes
to do it (wasn't watching it, so I missed it ending).

McMac
"There must have been a door there in the wall, when I came in"
- -The Wall, Pink Floyd.

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 9 Dec 1999 17:40:11 EST
From: [EMAIL PROTECTED]
Subject: Mersenne: Re: Testing M(110503) 

Luke Welsh wrote:

>How quickly can M(110503) be tested now-a-days on the fastest machines?

Paul Novarese replied:

>All times below are with Mlucas2.7z
>
>Processor  Speed   OS      Time(sec)
>
>EV6        500 DU4.0f      128
>EV6        466 RH6.0       130
>EV5        300 DU4.0e      247
>EV4        166 DU5.0       1078
>
>How long did it take you back in 1988?

Luke replied:

>Ten, (or was it eleven?), minutes.

It's silly, since the architectures and algorithms are quite different,
but I remember how pleased I felt when I started timing out the pre-
release v2.7 Mlucas code and finally broke 15 minutes (which is what
I thought I recalled Luke saying he needed) to test M110503 on my
now-rather-humble 200 MHz ev4.

Luke (now feeling the powahh of the dahhk side :) again:

>I don't recall counting instructions and computing the MFLOP
>rate.  Instead, I replied upon the compiler and profiler
>which said I was running very near the theoretical peak
>of 2.4 GFLOP.  I just told people 2.0 GFLOP.
>
>So now it takes only 2 minutes on a 500 MHz EV6?  Is that a
>1 GFLOP machine?

In theory, the ev6 can complete one floating add (FADD) and one FMUL
per cycle (as can all generations of the Alpha), so yes, a 500MHz ev6
is potentially a 1 GFLOP machine. However, at long FFT lengths memory
traffic and cache misses impose a penalty even on a beast like the ev6,
and since (at least the Mlucas) FFT is dominated by floating add/subtracts
(on average two of these for every FMUL) the best one sees in practice
at large lengths is around 1 FLOP per cycle on average.

For M110503, on the other hand, one needs only a 6K FFT, which needs
less than 64KB for double-precisdion operands using an in-place FFT,
i.e. the entire dataset can fit into the ev6's 64KB L1 cache. Thus
one is probably getting something approaching the theoretical maximum
(based on the clock rate and the code's mix of instructions) of
750 MFLOP.

>The DWT is twice as fast as my poor old
>FFT, and mixed-radix yields more improvements, but I don't
>see how the EV6 (1 GFLOP?) Lucas Lehmer is 5 times faster
>than the NEC SX/2 (2 GFLOP) was.

Actually, a factor-of-five speedup at similar FLOP rates is not
as unreasonable as it may appear at first sight. Indeed, the DWT
cuts the FFT length by a factor of two, but this generally speeds
things up by more than a factor of 2, especially when the dataset
sizes are close to the cache size(s). As you say, being able to
FFT length 6K rather than the 8K a power-of-2 FFT would need gives
a further speedup, so now you're probably 3-4x faster than a 16K
non-DWT-based FFT code would need. Throw in the additional speedups
in a code like Mlucas (e.g. the combined FFT-pass/pointwise-square/
IFFT-pass and IFFT-pass/carry-propagation/FFT-pass routines) and
you can expect an additional 20-40% speedup. There's your factor of 5.

>Maybe the NEC profiler lied to me?  Specs-manship?  I always
>trusted it because we were much faster than Slowinski and
>a lot of prospective clients were testing the machine (oil
>companies).

Indeed, 2-2.4 GFLOP does sound unreasonable, since that would imply
Mlucas is 10-12x faster than your code was. One other possibility
(besides the NEC profiler being wrong) comes to mind - your code
(like my pre-v2.7 ones) used an out-of-place FFT strategy, in which
the data are bounced between two arrays on each pass. In my old
code I combined that with a higher-radix FFT, i.e. one would read
a block of data from one array, do a bunch of operations (perhaps
as large as a radix-16 DFT) on them, and then write the results to
a second array, which makes for very simple data access patterns
which lend themselves to vector architectures like the NEC. Doing
lots of stuff between the array loads and stores limits the penalty
imposed by the extra memory traffic of the two-array sceme to perhaps
20-30%. On the other hand, if one uses simple radix-2 FFT passes (as
I believe your code did), the loads and stores will dominate the opcount,
and the out-place scheme will have roughly twice as many of these as an
in-place FFT. So perhaps that accounts for the additional factor of two
that is apparently needed to reconcile the timings of the two codes.

So I guess the oil companies were probably looking for pipelined
architectures? (Sorry Luke, I couldn't resist.)

- -Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sat, 11 Dec 1999 16:15:27 -0500
From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]>
Subject: Mersenne: P III took a 10 count

I Just fried my 550 p III.
My exponent was due dec 26.
Sending this e-mail via my 166mhz P I.
P III will be fixed by 12/15/1999.
Will this change my exponent run in any way.

My news reader does not have a spell checker.
I will have to use the spell checker that doubles as a fly swatter

Why is the word dictionary in the dictionary?
If you need the spelling, it's on the cover.
If you need the definition, you wouldn't know were to go to get the word
defined.


eye epologize N ad Vance fore N E spelin Miss steaks.
<grin>
(8>)>








_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 13 Dec 1999 21:19:40 -0500
From: Pierre Abbat <[EMAIL PROTECTED]>
Subject: Mersenne: mprime 19.1 glibc

I'd like to upgrade mprime, and I see it's linked with glibc 2.1. I have glibc
2.0.7. Will it work? I'm currently running 18.1.

phma
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #670
******************************

Reply via email to