Mersenne: Re: Mersenne Digest V1 #572

Steinar H. Gunderson Fri, 11 Jun 1999 12:52:17 -0700
On Fri, Jun 11, 1999 at 08:46:53AM -0700, Mersenne Digest wrote:
>This made the 
>computer run faster, I guess by increasing its conduction, and one result I 
>recall is getting a 600 MHz DEC Alpha chip to run at around 767 MHz?  Has 
>anyone bought this kind of computer, or perhaps done some kind of home 
>modification (like all the overclocking)?

I saw an article on Slashdot (http://slashdot.org -- try to pronounce that)
some weeks ago. Try to make a search. Whoever wrote that article had made
a dual-CPU cooler, so he could actually _double_ the CPU speed and still
make it run. (It even had some info on _why_ cooling helped.)

>My second question, what is a good factoring program for Win98 on a PII 
>system that allows you to enter a very large number and attempt to factor it, 
>thereby proving it either composite or prime?  Thanks for any help.

As somebody pointed out, the giantint package does exactly that (well,
actually, the numbers aren't 100% sure to be prime) for Linux/UNIX. Either
find someone who's willing to port it, or get Linux.

Example (previously shown on this list):

---
steinar:~# echo 123123123123123123123123123123 | ./factor
Sieving...
3 * 7 * 11 * 13 * 31 * 41 * 41 * 211 * 241 * 271 * 2161 * 9091 * 2906161
 
steinar:~# echo 1231231231231231231231231231231231 | time ./factor
Sieving...

Commencing Pollard rho...
...
111871
*
Commencing Pollard (p-1)...
..................................................................
Commencing ECM...
Choosing curve 1, with s = 346492192, B = 1000, C = 50000:
..

Commencing second stage, curve 1...
....
Choosing curve 2, with s = 2131939374, B = 1000, C = 50000:
..

Commencing second stage, curve 2...
....
18102915799
* 607957991560696039
1.92user 0.00system 0:02.12elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (100major+30minor)pagefaults 0swaps
---

---snip---

>You must be thinking of KryoTech: http://www.kryotech.com/

Of course, Kryotech too. The article I was referring to showed a way to get
an extra -10 or -20 degrees Celsius `on top of' Kryotech. (Totally built
from scratch.)

---snip---

>The optimization guide is packed full of tips.  It's about 150 pages
>in total, although half of it is a reference guide.

I've pointed George to this some time ago, and he told me there was a
lot of errors in it. If you look in the source code, all three of these
(usage of CMOVs/branch optimization, avoiding partial register stalls,
and data alignment) are implemented in the code already. However, it
tells very little about FPU optimization on the P6-family, which, I
believe, is what George is after.

Perhaps a posting on comp.lang.x86.asm (if I remember right) would be
an idea? I posted a question there once, and got very much constructive
information back.

>Thanks to David Willmore (not a turkey by any means), I've had a chance to try a soon-
>to-be-released update of my Mersenne code on a 500MHz Alpha 21264s, and it's extremely
>impressive - 0.18 seconds per iteration at FFT length 384K, fully three times faster
>than on my 400MHz 21164.

Hmmm... I'm currently doing an exponent of 7398xxx on my PII/448
(bus-overclocked), which should be 384K FFT length. My best result is 0.197
secs/iteration (stable, but not across reboots) using mprime 18.1. This
means the PII is about as fast as an Alpha (a little slower) on the same
clock speeds. Is Alpha over-hyped, is something wrong, or is just George
a terriffic x86 assembly coder?

>The performance on the 21264 is in line with the MIPS

Which means Intel is also in line with the MIPS? (I just saw some benchmarks
that promised K7 to be 40% faster than PII at FPU, looks like we have good
times ahead of us!)

---snip---

>1.  Why have 2nd LL tests been done in many cases when there are still
>exponents whose status is unknown?

Probably because P133s (is this number right) or slower are now automatically
assigned double-checking results, instead of full LL tests.

(As a side note, an AMD K6/200 instantly began checking out double-checking
assignments instead of factoring assignments when I set it to work 24 hours
(instead of 8) a day. I consider this a bug, unless somebody has a good
reason K6s shouldn't do integer arithmetics instead of FPU work.)

---snip---

>My vote for "Most Inane" would be to the guy a year or two ago who claimed
>to know for an absolute certainty that there were only, (I think it was) 37
>Mersenne primes.  Whatever the number was, it was about one more than had
>been discovered at that point.

`It is a scientific fact that your vision becomes worse if you shave off your
beard.' (Or whatever whoever said.)

I think much stupid has been said. Never say anything and claim 100%
certainity :-)

---snip---

>Even with a nice 550MHz PIII, a 33M exponent could be tested in maybe around
>1/12 the time of a P90 (about 6 times faster, as well as being much more
>optimized...maybe more like 1/10).  I think 80 years is a bit of an
>overestimate though...but I could be wrong on that.

In fact, I think the 90 GHz number _was_ right. (There was a post on this
list, where poster showed that even with a CPU that fast, it would take
ages to check a single billion-digit Mersenne prime.)

---snip---

>  So we are about 7.5*10^10 P90 years away from our first billion digit prime.

Hmmm, that depends a bit on where it is, and if it is there at all! 

---snip---

>  Following conservative estimates of cpu power and number of participants
>doubling every two years, I'd guess that we will have a our first billion
>digit prime in 2021, when we have 40 million participants and Pentium XV
>1000GHz processors.

I can remember seeing some figures predicting the number of Internet users
would cross the world population in just 20 years or so.

About all this lifetime stuff, I'm having a greater chance than most of
you, BTW. So there :-)

---snip---

>Talking about impatience, there is something I don't understand:
>are we waiting just for the doublecheck to be completed or does the
>EFF prize somehow require that M38 be kept secret until publication?
>I hope not, that would be very strange indeed...

Both, I think. It's not very strange, I guess they want it done `the
right way'. You know, that's how `grown-ups' think, and it's their
money, so they decide.

---snip---

>Some exponents take much longer than 2 months to LL test on slower machines.
>Those are probably just whichever 3M-area exponents got assigned to 286s and
>386s :-)

Well, most certainly 386s or 486s, my PII can run a 3M-exponent in a day
or so. (286s can't run Prime95, of course, they would need a special version,
and I'm not sure if George is keen on making one.)

---snip---

>As I said, unless there is an intervention or someone just takes it upon 
>themselves to double-check those exponents with software other than 
>George's (the very basis of doublechecking), we won't get confirmation of 
>M37 until 2003.... <sigh>

That depends on what you mean by `confirmation'. Since I'm totally lost
in the number of Mersenne primes being found, I guess M38 is the unconfirmed,
million-digit one, and M37 is the previous one. So, I guess what you're
looking for, is confirmation that M37 really is the 37th Mersenne prime,
not that it is prime. Is it really that important?

---snip--

>But if I look into my personal report I see that the "LL P90 CPU yrs"
>and all the other numbers on the same line like "Exponents LL tested"
>are still all on zerro, why??
>Did I do something wrong or what.
>Please clear this for me, I like to do it in the correct way.

`The correct way' is reporting automatically, via Prime95/mprime/whatever.
I did the same error, and Scott pointed me to the FAQ page... If you
re-report, using the automatic method, you will be credited. If this
is impossible, ask Scott ([EMAIL PROTECTED]), and I guess he'll
find a solution for you.

Oh, you could always send the results to me, so I can get the credit :-)

>Plus all the stores are decoded in separate cycles (2 uOps)
>I'm sure someone else will correct my mistakes ;)

Since the code is called (approx.) 500,000 times _per iteration_, and the
FPU unit has latency of at least 2-3 cycles per instruction (correct me),
I guess decoding stalls is only minor here, even though the function is
inlined, so it has to be decoded many times.

>Also, I noticed that no attention was paid to as far as K6 optimization (ie
>tossing the fxch's) in the current code... Any effort to improve that or is
>it not worth it?

I made an optimization of George's code, without more that a few FXCHs, but
it's probably not working, since I never had access to MASM... If you're brave,
you could always have a look at it (mail me) and try to fix it.

/* Steinar */
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne: Re: Mersenne Digest V1 #572

Reply via email to