Mersenne Digest Sunday, January 9 2000 Volume 01 : Number 677
----------------------------------------------------------------------
Date: Thu, 6 Jan 2000 18:50:31 EST
From: [EMAIL PROTECTED]
Subject: Mersenne: Re: Mlucas on Alpha compile problem?
Michael Hanscho wrote:
>I have some fast alpha machines... but i cannot compile the Fortran
>code...I downloaded Mlucas_2.7z.f90.gz and tried to compile it with:
>
>f90 -o Mlucas -O5 -fast -assume accuracy_sensitive -pipeline -unroll 1
>- -arch ev4 -tune ev4 -Olimit 200000 Mlucas.f90
>
>but got following error:
>
>f90: Severe: Mlucas.f90, line 10190: **Internal compiler error:
>segmentation violation signal raised** Please report this error along with
>the circumstances in which it occurred in a Software Problem Report.
>Note: File and line given may not be explicit cause of this error.
> end subroutine wrapper_square
Hi, Michael:
First off, note that I have Alpha binaries (TruUnix and Linux) at my
ftp site, so it should be unnecessary for you compile the code yourself.
Said binaries were compiled from the same source code you downloaded,
so I've no idea what may be happening on your system. What kind of
hardware/OS and which compiler version are you using?
Best regards,
- -Ernst
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
------------------------------
Date: Thu, 6 Jan 2000 18:50:41 EST
From: [EMAIL PROTECTED]
Subject: Mersenne: Re: cpu years/day vs GFlops/sec
Spike Jones wrote (Hi, Spike! Live long and prosper...)
>Processor gurus, please: using the equivalence that is suggested
>by the primenet status page [86.6 P90 CPU yr/day = 1042 GFlops]
>I calculate that a floating point operation must be about 3 CPU cycles.
Indeed, I calculate ~0.4 FLOP/cycle, which at first glance seems about a
factor of 2 too slow, even on the humble P90 (which should, assuming no
cache misses, be able to dispatch one FADD per cycle and (I believe - x86
experts, please correct me if I'm wrong) one FMUL every other cycle, for
a peak throughput of 1.5 FLOP/cycle. As a second ballpark-style check,
I did a similar calculation for Mlucas 2.7z running on a 500MHz Alpha
21264, which can in theory dispatch 2 FLOP (1 FADD and 1 FMUL) per cycle.
A 256K (real-element) DWT-based FFT (gory details of the calculation are
appended below) gets around 0.66 FLOP/cycle on the 21264. Thus, Prime95
seems to average around 27% of the peak theoretical throughput on the
Pentium (probably somewhat better on the PII and PIII) and Mlucas gets
around 33% of the peak throughput of the 21264.
Much of the difference (theoretical vs. actual throughput) is due to
the real-world overhead of servicing cache misses. (Note that on the
21264, a 256K array of doubles plus some smaller arrays for DWT weights
and FFT sincos data nearly fit in the 4 MB L2 cache, thus explaining
much of the performance gain vs. Prime95, but one still has L1 cache
misses to service.)
Another source of the discrepancy (at least for Mlucas) is the code's
mix of instructions, which tend to have around a 60/40 mix of FADD/FMUL,
thus ensuring that on the Alpha (even in the absence of cache misses)
the floating multiplier is idle at least a third of the time.
Lastly, all of these estimates neglect loads and stores in the FLOP count,
i.e. underestimate the true FLOP count by as much as a factor of 1.5 to 2.
That means that the CPU is really much busier than one expects simply
based on a count of the arithmetic operations.
Cheers,
- -Ernst
(Notes: real vector length 256K = 2^18 means complex length 2^17, for
which the Mlucas FFT algorithm uses complex radices 8,8,8,16,16.
A Radix-8 pass, with twiddles, needs 2.75 fmul, 5.0 fadd per real input.
A Radix-16 pass, with twiddles, needs 5.00 fmul, 6.5 fadd per real input.
Assuming complex twiddle multiplies are needed for each pass (in reality
they are not needed for the first pass of the FFT) slightly overestimates
the FLOP count, so neglect the extra ops needed for the real<==>complex
wrapper/square steps needed between the forward and inverse FFT, which
cost about the same per input as a pair of twiddle multiplies. Thus,
the above combination of radices needs 3*2.75+2*5.00 = 18.25 fmul and
3*5.0+2*6.5 = 28.0 fadd per real input per FFT. We do two FFTs per
iteration, hence need 36.5 fmul and 56 fadd per input element, or about
83 FLOP per real input. Add another 15-16 FLOP per real element during
the carry propagation phase, and we get around 100 FLOP per real vector
element per iteration. Multiply by 2^18 real elements to get the total
FLOP count per iteration, divide by .079 seconds per iteration on a 500MHz
21264 and divide by 500000000 cycles per second to get ~0.66 FLOP/cycle.)
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
------------------------------
Date: Thu, 06 Jan 2000 18:34:09 -0800
From: Stefan Struiker <[EMAIL PROTECTED]>
Subject: Mersenne: The Second Mersennium Behind Us, How Now For Myriad The Third?
Quiet Flows The List, so here is a Y10K query:
Does the "official" Y2K retrofit cover rollover from 9999 to 10000?
Tick, tick,
Stefan S.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
------------------------------
Date: Fri, 07 Jan 2000 14:15:50 +1100
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Re: Mersenne: mprime andr STOP signal
"Andrew L. Neporada" wrote:
> I am running mprime v 19.1 for FreeBSD, and sometimes I stop it using STOP
> signal ( especially when I want to see some films or so -- my computer is
> not fast enough ). Could this practice cause errors in LL test?
> I understand, that I maybe should just interrupt mprime and then launch
> it again, but I don't like this solution.
I have for years used STOP and CONT on mersenne1 (from the mers package)
under Ultrix without any problems at all. As long as do don't STOP it
in the middle of writing out a file and then kill it without letting it
finish, you should be ok.
Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
------------------------------
Date: Sat, 08 Jan 2000 00:27:02 -0800
From: Kevin Sexton <[EMAIL PROTECTED]>
Subject: Mersenne: Benchmarking tips?
I am about to upgrade my memory(and bus speed, allowing
processor speed increase to full 400mhz from underclocked
366) and I was wondering what the best way to benchmark
prime95 and mprime before and after the upgrade would be?
I figure restart with nothing else extra loading, and run a
test on prime95 , redo it with linux and mprime, add the
memory, repeat, change the jumpers, and repeat again.
But how should I test it?
Also how do I run the same test on mprime?
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
------------------------------
Date: Sun, 09 Jan 2000 16:51:55 -0500
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Mersenne: Prime95 makes Java slow?
I've got an application that uses Java for its interface. When Prime95 is
running, the Java app is extremely slow at times. When prime95 isn't
running, it always seems to be OK. Could it be that prime95 doesn't
realize that the Java app needs some CPU time?
+--------------------------------------------------------+
| Jud McCranie |
| |
| 137*2^197783+1 is prime! (59,541 digits, 11/11/99) |
+--------------------------------------------------------+
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
------------------------------
End of Mersenne Digest V1 #677
******************************