Mersenne Digest V1 #616

Mersenne Digest Thu, 19 Aug 1999 11:08:27 -0700

Mersenne Digest       Thursday, August 19 1999       Volume 01 : Number 616




----------------------------------------------------------------------

Date: Tue, 17 Aug 1999 17:37:38 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Linux mprime and glibc 2.1

On Mon, Aug 16, 1999 at 07:15:02PM -0400, Tom Goulet wrote:
>Bleah.  That's probably it.  Who wants to send me a free harddrive?  :)

Having swap space for Linux is a good idea in all cases, BTW. You should
be able to spend 16 MB or so on it anyway :-) (_Not_ having swap implies
that _everything_ has to be in memory, even those programs that don't do
a thing, just sit there idly... That means less memory for cache, and
worse performance.)

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Aug 1999 09:36:32 -0700
From: Will Edgington <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Simple question about P-1 factoring method

[EMAIL PROTECTED] writes:

   > Am I correct?  Or could a factor smaller than 2*k*p + 1 be missed in
   > some cases?

       In the last example a factor 16*97 + 1 could be missed.
   Otherwise all factors below 2*k*p + 1 should be found.  
   One extra squaring will achieve the 2*k*p + 1 bound.

Gack.  Yes, I should have caught that myself; it's the same situation
as for p, isn't it?

      The power of the P-1 algorithm is that it can potentially find 
   many larger factors, such as 252*p +1 using a stage 1 bound of 10.  

Of course; I realize that.  I was only looking at it this odd way
because of the trial factoring gaps I need to close.  Since I already
have the P-1 data, it's easy to do this.  If I didn't already have the
P-1 data, it would (most likely) be faster to simply do the trial
factoring.

Further, it seems to me that doing trial factoring to extend from P-1
factoring doesn't make sense.  Note that trial factoring would have to
check 2*(k + 1)*p + 1 next; P-1 only has to do k + 1 next if it's a
prime or prime power (or p).  Trial factoring could use the knowledge
of P-1 being done thru a stage one of k by "sieving" the trial factors
based on one less than the trial factors as well as the usual sieving
of the trial factors themselves, but that's exactly the set that P-1
would test with larger stage one bounds, and P-1 would, as you point
out, find more factors with at most a little extra work.  Right?

I've heard that P-1 is "more efficient" than trial factoring; does the
proof go along these lines?  Or is it more complicated than this?

Of course, if this is correct, then I should fill the trial factoring
gaps using P-1, at least to the largest stage one the program that I
use supports.

   > Does it matter whether p is prime or not?  I don't think so, but ...

       Not if you always include an exponentiation by p, and repeat it
   if necessary as you do primes <= k.

So a composite p should effectively be treated as if it were prime in
the powering even though it's prime factors are being used as well?
That certainly makes sense, given the extra power of 2 and of p used
because of the special form of Mersenne factors.

Thanks,

                                                Will
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Aug 1999 13:46:45 -0400 (EDT)
From: Lucas Wiman  <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Simple question about P-1 factoring method

> If I understand P-1 factoring correctly, then using it to a stage one
> bound of k to try to factor M(p) will find all possible factors less
> than or equal to 2*k*p + 1.

Yes, of the form n*p+1 (not 2*n*p+1 :).  This is for the simple reason
that every power of a prime <=k must divide Q (due to your definition
of how Q is produced).  Then by the fundemental theorem of arithmetic,
(all numbers are able to be evenly divided into primes < themselves),
we know that any number <k must be a product of powers of primes <k,
and hence divide Q.

This is (unfortunatly) not useful for you, if your goal is to
deterministically find factors less than a certain limit.  P-1 would
be much slower than trial division if that is your goal.  P-1 is
useful for finding very large factors that would be missed by trial
division.

Oop, just got your other letter:

> I've heard that P-1 is "more efficient" than trial factoring; does the
> proof go along these lines?  Or is it more complicated than this?

It's only "sort of" more efficient than trial factoring.  It will find
(some) large factors more quickly than trial factoring will, but it
wouldn't find the factor 2^40*p+1, which (unless p is very small) would
be found very easily by trial factoring.

> Of course; I realize that.  I was only looking at it this odd way
> because of the trial factoring gaps I need to close.  Since I already
> have the P-1 data, it's easy to do this.  If I didn't already have the
> P-1 data, it would (most likely) be faster to simply do the trial
> factoring.

Why would you have trial factoring with such low bounds, but P-1 with
such high bounds?  Just asking...

- -Lucas

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Aug 1999 15:36:08 -0400 (EDT)
From: Lucas Wiman  <[EMAIL PROTECTED]>
Subject: Re: Mersenne: RE: Factoring more

> Numbers above   are factored to
> -------------   ---------------
> 71000000        2^72
> 57020000        2^71
> 44150000        2^70
> 35100000        2^69
> 28130000        2^68
> 21590000        2^67
> 17850000        2^66
> 13380000        2^65
> 8250000         2^64

Isn't this the "optimal" configuration if all computers in GIMPS
were identical?

There are a number of 486's that are doing only factoring, does
it take these into account?  Think about it, there are a number of
computers which should be factoring numbers faster than LL
tests can be performed.  Call this "factoring profit."  Wouldn't
it make sense to keep factoring profit as low as possible, as this
could speed up the more immediate consern of sooner-to-be-performed
LL tests?

As an example, I just recieved the factoring assignment of 10258511,
but I should think that we wouldn't even start these tests for some time.
Why not go back and factor some 8M exponents further so as to save a
few current LL tests?  Wouldn't this actually get us to the next prime
quicker?

Or do I need to get more sleep :)?

- -Lucas
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Aug 1999 23:42:45 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: RE: Factoring more

On 17 Aug 99, at 15:36, Lucas Wiman wrote:
> 
> Isn't this the "optimal" configuration if all computers in GIMPS
> were identical?
> 
> There are a number of 486's that are doing only factoring, does
> it take these into account?  Think about it, there are a number of
> computers which should be factoring numbers faster than LL
> tests can be performed.  Call this "factoring profit."  Wouldn't
> it make sense to keep factoring profit as low as possible, as this
> could speed up the more immediate consern of sooner-to-be-performed
> LL tests?

I questioned George about varying the factoring depth for different 
processor types in v19; his reply was that he was treating all 
processors as if they were P6 family. That means that P5 will spend a 
higher percentage of time factoring than is optimal; it also means 
that factoring assignments will take longer, thus reducing the lead 
of factoring over testing.

I don't think you should be too worried about 486s. There aren't that 
many left in the project, and they take about a month to run the 
current factoring assignments (compared with about 4 days on a P100).

What may well be a problem in the near future is the fast systems 
still running v17. Since double-check assignments for exponents <2^22 
are rapidly running out, they will have to be given factoring instead 
8-(
> 
> As an example, I just recieved the factoring assignment of 10258511,
> but I should think that we wouldn't even start these tests for some time.
> Why not go back and factor some 8M exponents further so as to save a
> few current LL tests?  Wouldn't this actually get us to the next prime
> quicker?
> 
> Or do I need to get more sleep :)?

Maybe you need more sleep. Pick a single exponent in the 8M range; 
there's a good chance that you could spend hundreds of CPU years 
trying to factor the corresponding Mersenne number by _any_ known 
method and not get anywhere, whereas somewhere around half a P90 CPU 
year will yield a solid result.

What we want to minimise, for an "average" system (probably the P6 
model is close enough), for any particular exponent is Tf + (1-p)Tl, 
where Tf is the estimated factoring time (bearing in mind that 
factoring may be truncated when a factor is found), p is the 
probability that factoring will be successful and Tl is the estimated 
time to run a LL test. Clearly we will find the next Mersenne prime 
faster if we complete testing more exponents; we will maximize the 
number of completely tested exponents with whatever resources we have 
available if we minimize the expected effort needed to completely 
test an exponent.

The idea of factoring to a particular limit is that trial factoring 
is O(2^d/p) whereas LL testing is O(p^2 log p). If you go 1 bit 
deeper, you double the factoring effort but find a lot less than 
twice as many factors. If you go 3 or 4 bits deeper, trial factoring 
(with an uncertain outcome, somewhere about 1 in 4 chance at best of 
finding a factor) actually costs as much CPU time as running a LL 
test. Mind you, the current limits are set based on theory and 
statistical data which may not be entirely above suspicion; if you 
can give a sound theoretical argument or point out unambiguous trends 
in trial factoring success rates which clearly indicate that it would 
be cost effective to factor deeper, I'm sure George would implement 
the neccessary changes.

Meanwhile, those people running factoring assignments (on slow 
machines, or otherwise) are saving those people who wish to 
concentrate on LL testing the 5 to 10 percent of CPU cycles which 
they'd otherwise have to devote to trial factoring 8-)

Sooner or later, we *will* get round to running LL tests on those 10 
million range exponents, so the effort isn't wasted.

I wouldn't wish to encourage users with fast systems to run factoring 
instead of LL tests by choice - I don't want to see factoring get 
_too_ far ahead of LL tests, or LL tests _too_ far ahead of double-
checks, for that matter - the balance is probably about right at the 
moment (certainly far better than it was this time last year, when 
very few people were running double-checks, because Intel systems 
weren't eligible until v17). If anything I think we should encourage 
some of the people running factoring on reasonable systems to run 
double-checks instead.

Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 13:37:24 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Mersenne: Alpha DS20 timings.

Folks,

Compaq have a DS20 Alpha with 2 500MHz 21264 CPUs on the internet for
people to try out.  I've modified the mers package to that it can print
out iteration timing (patches coming soon Will!).  The iteration times
are fairly constant across 10 samples, so I've only listed one per
program/exponent.  The -C means don't dump a checkpoint at any time, and
- -S N means print iteration times every N iterations.  The programs were
compiled with the DEC C compiler using "cc -fast -arch host -O4".  The
exponents were choosen just to demonstrate different FFT lengths.

Before anyone says anything, I know that the "iters/sec" should be
"secs/iter" :-)

% ./fftlucas -C -S 10 900001
speed: 10 iters in  1.362 seconds, 0.136 iters/sec (fft len   64k)
% ./fftlucas -C -S 10 1400001
speed: 10 iters in  3.168 seconds, 0.317 iters/sec (fft len  128k)  
% ./fftlucas -C -S 10 2900001
speed: 10 iters in  7.124 seconds, 0.712 iters/sec (fft len  256k) 
% ./fftlucas -C -S 10 5800001
speed: 10 iters in 16.410 seconds, 1.641 iters/sec (fft len  512k)

% ./mersenne1 -C -S 10 900001
speed: 10 iters in  0.832 seconds, 0.083 iters/sec (fft len   64k)
% ./mersenne1 -C -S 10 1400001
speed: 10 iters in  1.797 seconds, 0.180 iters/sec (fft len  128k)
% ./mersenne1 -C -S 10 2900001
speed: 10 iters in  4.257 seconds, 0.426 iters/sec (fft len  256k)
% ./mersenne1 -C -S 10 5800001
speed: 10 iters in  9.055 seconds, 0.905 iters/sec (fft len  512k)

% ./MacLucasUNIX -C -S 10 1400001
speed: 10 iters in  0.152 seconds, 0.015 iters/sec (fft len   64k)
% ./MacLucasUNIX -C -S 10 2900001
speed: 10 iters in  0.362 seconds, 0.036 iters/sec (fft len  128k)
% ./MacLucasUNIX -C -S 10 5800001
speed: 10 iters in  0.782 seconds, 0.078 iters/sec (fft len  256k)
% ./MacLucasUNIX -C -S 10 11600001
speed: 10 iters in  1.777 seconds, 0.178 iters/sec (fft len  512k)
% ./MacLucasUNIX -C -S 10 23200001
speed: 10 iters in  4.606 seconds, 0.461 iters/sec (fft len 1024k)
% ./MacLucasUNIX -C -S 10 46400001
speed: 10 iters in 13.601 seconds, 1.360 iters/sec (fft len 2048k)

and for the 10^n digit fans:

% ./MacLucasUNIX -C -S 10 33219281
speed: 10 iters in  4.634 seconds, 0.463 iters/sec (fft len  1024k)
% ./MacLucasUNIX -C -S 3 332192831
speed: 3 iters in 65.950 seconds, 21.983 iters/sec (fft len 16384k)

The machine "only" had 1GB of RAM, and the 16M FFT took up about 675MB
of RAM.  I couldn't test any larger numbers :-)


For comparison, this is MacLucasUNIX on a 500MHz AlphaPC164 (21164 CPU)
compiled with "gcc -mcpu=21164a -Wa,-m21164a -O6":

speed: 10 iters in  0.331 seconds, 0.033 iters/sec (fft len   64k)
speed: 10 iters in  0.842 seconds, 0.084 iters/sec (fft len  128k)
speed: 10 iters in  1.918 seconds, 0.192 iters/sec (fft len  256k)
speed: 10 iters in  4.219 seconds, 0.422 iters/sec (fft len  512k)
speed: 10 iters in 10.531 seconds, 1.053 iters/sec (fft len 1024k)

GCC isn't the best compiler around with floating point, so these figures
might not be the best comparison between the 21164 and 21264.

Also for comparison, here's some figures for MacLucasUNIX on a 200MHz
UltraSparc with different FFT lengths:

speed: 10 iters in  0.530 seconds, 0.053 iters/sec (fft len   64k)
speed: 10 iters in  1.739 seconds, 0.174 iters/sec (fft len  128k)
speed: 10 iters in  3.737 seconds, 0.374 iters/sec (fft len  256k)
speed: 10 iters in  7.459 seconds, 0.746 iters/sec (fft len  512k)
speed: 10 iters in 16.261 seconds, 1.626 iters/sec (fft len 1024k)

Even dividing the iteration times by 2.5 (which assumes that memory
bandwidth scales equally well with the UltraSparcs), the Alpha 21264
comes up favorably.


Ernst - since nigel is no more, where can I get the latest f90 code?
I've got 2.5b, and it's giving me some errors:

    no restart file found...looking for range file...
    no range file found...switching to interactive mode.
   Enter p,n (set n=0 for default FFT length) >5100071,262144
   Enter 'y' to run a self-test, <return> for a full LL test >y
    p is prime...proceeding with Lucas-Lehmer test...
    using an FFT length of      262144
    this gives an average    19.4552268981934      bits per digit
   M 5100071 Roundoff warning on iteration      10 maxerr =  0.499997074861
    FATAL ERROR...Halting execution.

Testing 3100079 (at about 11 bits per digit) gives the same errors, and
this happens with or without optimisation turned on.


Well, that's my benchmarking done for the day...

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Aug 1999 23:44:22 -0500
From: Ken Kriesel <[EMAIL PROTECTED]>
Subject: Mersenne: the QAtesters list is closed

To keep the group a manageable size, I'm closing it for now.

Ken

Ken Kriesel, PE <[EMAIL PROTECTED]>
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Aug 1999 22:48:19 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Alpha DS20 timings.

> Compaq have a DS20 Alpha with 2 500MHz 21264 CPUs on the internet for
> people to try out.
<snip>
> and for the 10^n digit fans:
>
> % ./MacLucasUNIX -C -S 10 33219281
> speed: 10 iters in  4.634 seconds, 0.463 iters/sec (fft len  1024k)
> % ./MacLucasUNIX -C -S 3 332192831
> speed: 3 iters in 65.950 seconds, 21.983 iters/sec (fft len 16384k)
>
> The machine "only" had 1GB of RAM, and the 16M FFT took up about 675MB
> of RAM.  I couldn't test any larger numbers :-)

Just like I thought...the 21264 really is a monster.

Compaq had a few of their REALLY nice boxes at the engineer conference last
week in Toronto.  I desperately wanted to run some benchmarks, but something
told me that I shouldn't run software on these machines without asking
permission... hmmm..

Can't wait to get me one of them boxes!

Aaron

ps - Is anyone else besides me happy that Compaq is ditching that boring old
"computer beige" in favor of the "opal" (basically white) color for their
servers?  I think they look snazzy.

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 02:03:15 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings.

Hi,

At 01:37 PM 8/18/99 +1000, Simon Burge wrote:
>and for the 10^n digit fans:
>% ./MacLucasUNIX -C -S 10 33219281
>speed: 10 iters in  4.634 seconds, 0.463 iters/sec (fft len  1024k)

You can't test M33219281 with a 1024k FFT, you'll need a 2048k fft :(

George

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 16:19:25 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings. 

George Woltman wrote:

> Hi,
> 
> At 01:37 PM 8/18/99 +1000, Simon Burge wrote:
> >and for the 10^n digit fans:
> >% ./MacLucasUNIX -C -S 10 33219281
> >speed: 10 iters in  4.634 seconds, 0.463 iters/sec (fft len  1024k)
> 
> You can't test M33219281 with a 1024k FFT, you'll need a 2048k fft :(

I guess MacLucasUNIX didn't pick up a error with a 1M FFT in the first
200 iterations.  Oh well, at 1.361 secs/iter with a 2M FFT, it'll take
roughly 523 days...

Do we have any Intel timings on a FFT this size?

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 09:45:04 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings.

On 18 Aug 99, at 13:37, Simon Burge wrote:

> % ./MacLucasUNIX -C -S 10 1400001
> speed: 10 iters in  0.152 seconds, 0.015 iters/sec (fft len   64k)
> % ./MacLucasUNIX -C -S 10 2900001
> speed: 10 iters in  0.362 seconds, 0.036 iters/sec (fft len  128k)
> % ./MacLucasUNIX -C -S 10 5800001
> speed: 10 iters in  0.782 seconds, 0.078 iters/sec (fft len  256k)
> % ./MacLucasUNIX -C -S 10 11600001
> speed: 10 iters in  1.777 seconds, 0.178 iters/sec (fft len  512k)
> % ./MacLucasUNIX -C -S 10 23200001
> speed: 10 iters in  4.606 seconds, 0.461 iters/sec (fft len 1024k)
> % ./MacLucasUNIX -C -S 10 46400001
> speed: 10 iters in 13.601 seconds, 1.360 iters/sec (fft len 2048k)

Interesting.

Could I suggest that your figures may be a bit misleading. The point 
is that, when the remaindering operation kicks in, roundoff errors 
start to take effect & MacLucasUNIX generally restarts with the next 
higher FFT size. You should really be running at least 100 iterations 
to be sure that you have the appropriate FFT size for the exponent, 
and that the timing isn't distorted by not running any code needed to 
implement the remaindering operation.

I find, running MLU on a Alpha 21164-533, 128K FFT works up to about 
exponent 2.35 million, & pro rata. MLU on a Sparc seems to be able to 
run a bit higher, somewhere around 2.45 million seems to be OK for a 
128K FFT. Mind you, a Ultra IIi-300 is only about 0.4x the speed of a 
Alpha 21164-533, running MLU compiled using gcc 2.8.1 on both 
systems.

The timings I have - from complete double tests - are 
128K FFT, 25000 iters/27 minutes = 0.065 sec/iter
256K FFT, 10000 iters/31 minutes = 0.186 sec/iter
512K FFT, 5000 iters/27 minutes = 0.324 sec/iter

(Ultra IIi-300, 256K FFT, 5000 iters/33 mins = 0.396 sec/iter)

512K FFT runs nicely in only 64MB, but 1024K wouldn't.

For short tests of 400 iterations (for QA testing) I've run lucdwt 
(from Richard Crandall's giantint package, with minor modifications 
to output) on exponents up to nearly 80 million i.e. 4096K FFT. This 
just about fitted into 256MB; I was unable to proceed to 8192K FFT 
since I have only 320 MB on my system (& can't justify buying any 
more - in any case, Prime95 v19 gives up at ~79,600,000). These tests 
were running at 90 to 100 iterations per hour for 4096K FFT. See 
ftp://lettuce.edsc.ulst.ac.uk/gimps/PrimeQA/QADATA.TXT

BTW here in the UK you can purchase a complete Alpha 21164-533 system 
with a decent hard drive & 128MB RAM, preloaded with RedHat linux, 
for under 1500 pounds sterling. See http://www.compusys.co.uk/
(please forgive me for "advertising"; I have no connection to this 
company except as a customer).

Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 06:35:14 -0700
From: "Joth Tupper" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings.

- ----- Original Message -----
From: Aaron Blosser <[EMAIL PROTECTED]>
To: Mersenne@Base. Com <[EMAIL PROTECTED]>
Sent: Tuesday, August 17, 1999 9:48 PM
Subject: RE: Mersenne: Alpha DS20 timings.


> > Compaq have a DS20 Alpha with 2 500MHz 21264 CPUs on the internet for
> > people to try out.
> <snip>
> > and for the 10^n digit fans:
> >
> > % ./MacLucasUNIX -C -S 10 33219281
> > speed: 10 iters in  4.634 seconds, 0.463 iters/sec (fft len  1024k)
> > % ./MacLucasUNIX -C -S 3 332192831
> > speed: 3 iters in 65.950 seconds, 21.983 iters/sec (fft len 16384k)
> >
> > The machine "only" had 1GB of RAM, and the 16M FFT took up about 675MB
> > of RAM.  I couldn't test any larger numbers :-)
>
> Just like I thought...the 21264 really is a monster.
>
> Compaq had a few of their REALLY nice boxes at the engineer conference
last
> week in Toronto.  I desperately wanted to run some benchmarks, but
something
> told me that I shouldn't run software on these machines without asking
> permission... hmmm..
>
> Can't wait to get me one of them boxes!
>
> Aaron
>
> ps - Is anyone else besides me happy that Compaq is ditching that boring
old
> "computer beige" in favor of the "opal" (basically white) color for their
> servers?  I think they look snazzy.
>

But do they have fins and feet and top-knots?
(Hard drives have gotten big enought that "fatware" had to move to the
cases.)



_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 00:02:09 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings. 

"Brian J. Beesley" wrote:

> Could I suggest that your figures may be a bit misleading. The point 
> is that, when the remaindering operation kicks in, roundoff errors 
> start to take effect & MacLucasUNIX generally restarts with the next 
> higher FFT size. You should really be running at least 100 iterations 
> to be sure that you have the appropriate FFT size for the exponent, 
> and that the timing isn't distorted by not running any code needed to 
> implement the remaindering operation.

I actually ran all tests (except for 332192831 - the 16M FFT was just
too slow :) for about 170 iterations, but just picked a single result.
Maybe the data would have been more meaningful if I'd left the exponent
out and just reported the FFT length.

> I find, running MLU on a Alpha 21164-533, 128K FFT works up to about 
> exponent 2.35 million, & pro rata. MLU on a Sparc seems to be able to 
> run a bit higher, somewhere around 2.45 million seems to be OK for a 
> 128K FFT. Mind you, a Ultra IIi-300 is only about 0.4x the speed of a 
> Alpha 21164-533, running MLU compiled using gcc 2.8.1 on both 
> systems.

I've got static MLU ev5 and ev6 binaries if Linux can run them under
some sort of Digital Unix emulation.  Might be interesting to see how
gcc 2.8.1 compares with DEC's C compiler.

Look at ftp://melanoma.cs.rmit.edu.au/pub/simonb/MLU-ALPHA.tar.gz for
the binaries.

> The timings I have - from complete double tests - are 
> 128K FFT, 25000 iters/27 minutes = 0.065 sec/iter
> 256K FFT, 10000 iters/31 minutes = 0.186 sec/iter
> 512K FFT, 5000 iters/27 minutes = 0.324 sec/iter

No matter which way you look at it, the 21264 is fast :-)

> For short tests of 400 iterations (for QA testing) I've run lucdwt 
> (from Richard Crandall's giantint package, with minor modifications 
> to output) on exponents up to nearly 80 million i.e. 4096K FFT. This 
> just about fitted into 256MB; I was unable to proceed to 8192K FFT 
> since I have only 320 MB on my system (& can't justify buying any 
> more - in any case, Prime95 v19 gives up at ~79,600,000). These tests 
> were running at 90 to 100 iterations per hour for 4096K FFT. See 
> ftp://lettuce.edsc.ulst.ac.uk/gimps/PrimeQA/QADATA.TXT

The file format seems to be

        exponent,iter-count,residue,??,??

Does the "lucdwt" mean that lucdwt was used to generate the file (and
Prime95 is tested against it)?  And what's the last field (which some
lines don't have)?

> BTW here in the UK you can purchase a complete Alpha 21164-533 system 
> with a decent hard drive & 128MB RAM, preloaded with RedHat linux, 
> for under 1500 pounds sterling.

A few of us Aussies recently purchased some PC164-500 motherboards
(500MHz 21164) for $US250 - easily the fastest computer I own now!

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 10:04:43 -0500
From: "Willmore, David" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Alpha DS20 timings.

We ran the DC for M38 on a DS20/500.  Using Ernst's code we were getting .18
s/i, but I don't remember the FFT size.   I want to say 384K?  Ernst?

Cheers,
David
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 16:41:18 +0100
From: Gordon Spence <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Manual Comm. with Primenet

>
>Date: Tue, 10 Aug 1999 12:49:47 -0700
>From: "Joth Tupper" <[EMAIL PROTECTED]>
>Subject: Re: Mersenne: Contacting PrimeNet during LL test
>
>Yeah, I contact Primenet whenever I feel like it.
>
>I like to establish a modem connection, click {Test|Stop}and then
>{Test|PrimeNet...}, click Send new completion dates and OK.  Some of my
>machines (usually 3) connect over a LAN and update PrimeNet automatically.
>Two I use this somewhat manual technique for.  I try to remember to send new
>dates at least once a month for each machine running.

There used to be a problem with doing this if you had a result waiting to
be checked in. If you do this then your account does not get updated with
the LL test result though the .spl file disappears of your machine.....

I believe it was down to George's comms code with the primenet server.
Perhaps George and/or Scott could update us on this?

regards

Gordon


Gordon Spence,                             Nokia IP Telephony
Applications Engineer                      Grove House, Waltham Way,
[EMAIL PROTECTED]                      White Waltham, Maidenhead,
http://www.nokiaiptel.com/                 Berkshire, SL6 3TN,  UK.
Office: +44 1628 827204                    GSM: +44 385 576623
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 18 Aug 1999 23:10:55 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings. 

On 19 Aug 99, at 0:02, Simon Burge wrote:

> I've got static MLU ev5 and ev6 binaries if Linux can run them under
> some sort of Digital Unix emulation.  Might be interesting to see how
> gcc 2.8.1 compares with DEC's C compiler.

I guess gcc would look "ordinary". Unfortunately linux uses the NT 
PAL code, this makes running DEC binaries problematic.

If/when I ever get some spare time. I'm going to have a look at the 
code generated in the critical loops & see if I can hand-optimize it 
a bit. There _should_ be a factor of at least 2 in there somewhere.

> No matter which way you look at it, the 21264 is fast :-)

Yeah, sure is - I noted that Ernst Meyer managed to verify M38 in 
about 60% of the time I estimated on my system, despite a marginally 
lower clock speed on his 21264. Mind you, the current 21264 systems 
seem to be carrying heavy price tags 8-(
> 
> The file format seems to be
> 
>       exponent,iter-count,residue,??,??
> 
> Does the "lucdwt" mean that lucdwt was used to generate the file (and
> Prime95 is tested against it)?  And what's the last field (which some
> lines don't have)?

Yes, the output is from lucdwt. The code in Prime95 v19 is being 
tested against this output ... so far so good! The last field is the 
max rounding error, which is a sort of indication of the reliability 
of the data - when it gets dangerously large, jump to the next FFT 
size. It's not always present because an early version of the program 
only output max error at the end of the run.
> 
> A few of us Aussies recently purchased some PC164-500 motherboards
> (500MHz 21164) for $US250 - easily the fastest computer I own now!

I'll have a look & see if I can turn up anything similar. $250 sounds 
crazy, the processor alone should cost more than that!

I find my 21164LX-533 runs code compiled from C source with an Alpha 
version of gcc about 4x as fast as an Intel PII-350 runs the same 
code compiled with an Intel version of the same mark of the same 
compiler.


Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 09:20:10 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings. 

"Brian J. Beesley" wrote:

> On 19 Aug 99, at 0:02, Simon Burge wrote:
> 
> > I've got static MLU ev5 and ev6 binaries if Linux can run them under
> > some sort of Digital Unix emulation.  Might be interesting to see how
> > gcc 2.8.1 compares with DEC's C compiler.
> 
> I guess gcc would look "ordinary". Unfortunately linux uses the NT 
> PAL code, this makes running DEC binaries problematic.
> 
> If/when I ever get some spare time. I'm going to have a look at the 
> code generated in the critical loops & see if I can hand-optimize it 
> a bit. There _should_ be a factor of at least 2 in there somewhere.

I've updated that tar file with the .s files produced by the DEC C
compiler.  They may be of some help.

> [[ QA test file info ]]

I might look at getting the mers package to use this test file...  Doesn't
hurt to have another test.

> > A few of us Aussies recently purchased some PC164-500 motherboards
> > (500MHz 21164) for $US250 - easily the fastest computer I own now!
> 
> I'll have a look & see if I can turn up anything similar. $250 sounds 
> crazy, the processor alone should cost more than that!

I think it was a one-off deal.  Needless to say we jumped at it!

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 11:23:21 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Alpha DS20 timings. 

"Willmore, David" wrote:

> We ran the DC for M38 on a DS20/500.  Using Ernst's code we were getting .18
> s/i, but I don't remember the FFT size.   I want to say 384K?  Ernst?

I tried Ernst's Mlucas on the DS20, and it wanted to use 384k for M38.
I'm just trying MacLucasUNIX now, and it's done 27500 iterations using a
256k FFT.  At 0.077 secs/iter, it should take about 150 hours to run to
completion.  I'll see if I can get permission to run the full test.

MLU seems to use a 256k FFT for 8388473 (no jump to 512k after 5000
iterations), and a 512k FFT for 8388539.

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 15:03:47 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Mersenne: mersenne1 vs. MacLucasUNIX on UltraSparc.

Getting off the Alpha topic for a moment, and back to the computers I
have full-time access to - UltraSparcs.

Out of laziness (and convenience to a certain extent), I've not looked
at moving my machines from using mersenne1 to MacLucasUNIX until now.
And boy, should I have looked at this earlier!  Here's the results

        M( 2458279 )C, 0x5a1be55da1237a68, n = 131072, mersenne1 v3.10  Kline
        1912405.78 user 669.35 sys 593:18:09 real

        M( 2458279 )C, 0x5a1be55da1237a68, n = 131072, MacLucasUNIX v6.20  Sweeney
        410368.73 user 387.03 sys 129:10:19 real

In defence of mersenne1, it was only compiled with "gcc -O2" as it was
also used on some Sparc20s and SS1000's.  MacLucasUNIX was compiled with
"gcc -O6 -mcpu=v9 -Wa,-xarch=v8plusa" so it'll use the new instructions
available on the UltraSparc.  Also, from what I understand, -O>3 is the
same as -O3 anyways on gcc...

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 08:01:07 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Alpha DS20 timings.

On Wed, Aug 18, 1999 at 11:10:55PM +0100, Brian J. Beesley wrote:
>I find my 21164LX-533 runs code compiled from C source with an Alpha 
>version of gcc about 4x as fast as an Intel PII-350 runs the same 
>code compiled with an Intel version of the same mark of the same 
>compiler.

Note that gcc is currently not very good for Intel chips (I think
PentiumGCC - http://www.goof.com/pcg/ fixes some problems there,
but it introduces more bugs), since it assumes that you have many
registers. Which 80x86s don't have :-( (Wish there was a way to
program P6 microcode directly! Over 30 registers... Yummy...)

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 07:47:32 -0600
From: "Aaron Blosser" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Re: Alpha DS20 timings.

> Note that gcc is currently not very good for Intel chips (I think
> PentiumGCC - http://www.goof.com/pcg/ fixes some problems there,
> but it introduces more bugs), since it assumes that you have many
> registers. Which 80x86s don't have :-( (Wish there was a way to
> program P6 microcode directly! Over 30 registers... Yummy...)

I can't wait for Merced and direct access to all of those FPU and general
purpose INT registers.  You think 30 is yummy, how about 128 FPU and 128 gen
purpose?  A virtual feast!

I imagine you could significantly speed up the code by keeping much of the
data in register.  REG-REG operations take a lot less time than a REG-MEM
operation.  Should be delicious.

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Fri, 20 Aug 1999 00:19:21 +1000
From: Simon Burge <[EMAIL PROTECTED]>
Subject: Merced (was Re: Mersenne: Re: Alpha DS20 timings.)

"Aaron Blosser" wrote:

> I can't wait for Merced and direct access to all of those FPU and general
> purpose INT registers.  You think 30 is yummy, how about 128 FPU and 128 gen
> purpose?  A virtual feast!
> 
> I imagine you could significantly speed up the code by keeping much of the
> data in register.  REG-REG operations take a lot less time than a REG-MEM
> operation.  Should be delicious.

>From what I understand of Merced, compiler technology is going to be the
problem.  It's probably not unreasonable to expect large performance
increases as the intelligence of compilers (especially the "free"
compilers like gcc and egcs) catches up to the theoretical performance
of the CPU.

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Aug 1999 11:21:32 -0500
From: "Willmore, David" <[EMAIL PROTECTED]>
Subject: RE: Merced (was Re: Mersenne: Re: Alpha DS20 timings.)

> From: Simon Burge [SMTP:[EMAIL PROTECTED]]
> From what I understand of Merced, compiler technology is going to be the
> problem.  It's probably not unreasonable to expect large performance
> increases as the intelligence of compilers (especially the "free"
> compilers like gcc and egcs) catches up to the theoretical performance
> of the CPU.
> 
Assembly! :)
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #616
******************************
Mersenne Digest V1 #616

Reply via email to