Re: Mersenne: Re: Mersenne Digest V1 #572

Brian J. Beesley Fri, 11 Jun 1999 14:43:25 -0700
> >My second question, what is a good factoring program for Win98 on a PII 
> >system that allows you to enter a very large number and attempt to factor it, 
> >thereby proving it either composite or prime?  Thanks for any help.
> 
> As somebody pointed out, the giantint package does exactly that (well,
> actually, the numbers aren't 100% sure to be prime) for Linux/UNIX. Either
> find someone who's willing to port it, or get Linux.

Microsoft Visual C++ compiles it quite happily under Windoze. Or it 
is possible to get gcc to work in a "DOS box".

> Hmmm... I'm currently doing an exponent of 7398xxx on my PII/448
> (bus-overclocked), which should be 384K FFT length. My best result is 0.197
> secs/iteration (stable, but not across reboots) using mprime 18.1. This
> means the PII is about as fast as an Alpha (a little slower) on the same
> clock speeds. Is Alpha over-hyped, is something wrong, or is just George
> a terriffic x86 assembly coder?

I have an Alpha 21164-533. For C code compiled using gcc & run under 
linux, it's _at least_ 4x as fast as a PII-350. Considerably quicker 
than that, if the code can use 64-bit integers intelligently.

George _is_ a teriffic x86 assembly coder. Ernst Mayer's program is 
written in a high-level language (Fortran-90); if anyone has the time 
& the skill to do an Alpha assembly optimization of Ernst's code 
which is half as good as George's work on Intel, it would almost 
certainly run at least twice as fast. 
> 
> >The performance on the 21264 is in line with the MIPS
> 
> Which means Intel is also in line with the MIPS? (I just saw some benchmarks
> that promised K7 to be 40% faster than PII at FPU, looks like we have good
> times ahead of us!)
> 
Actually the K7 has pinched some ideas (like the 200 MHz 128-bit data 
bus) developed for the Alpha.

I think Ernst was comparing the performance of the Alpha 21264 with 
the MIPS R12000 CPU running code from the same (Fortran-90) source.

> (As a side note, an AMD K6/200 instantly began checking out double-checking
> assignments instead of factoring assignments when I set it to work 24 hours
> (instead of 8) a day. I consider this a bug, unless somebody has a good
> reason K6s shouldn't do integer arithmetics instead of FPU work.)

No. The effective CPU speed of a K6 at 200 MHz is _100_ MHz, the K6 
FPU is less efficient than the Pentium FPU. Divide that figure by 3 
if you're running 8 hrs/day and you end up with less than 50 MHz, so 
you get factoring assignments by default. Tell it you're running 24 
hrs/day and you will get double-checking assignments.

> Well, most certainly 386s or 486s, my PII can run a 3M-exponent in a day
> or so. (286s can't run Prime95, of course, they would need a special version,
> and I'm not sure if George is keen on making one.)

Waste of time. All the remaining operating 286's in the world put 
together probably don't amount to more than a dozen or so P90s.
> 
> That depends on what you mean by `confirmation'. Since I'm totally lost
> in the number of Mersenne primes being found, I guess M38 is the unconfirmed,
> million-digit one, and M37 is the previous one. So, I guess what you're
> looking for, is confirmation that M37 really is the 37th Mersenne prime,
> not that it is prime. Is it really that important?
> 
It is, if you're going to denote them that way. If you're going to 
give them serial numbers in terms of discovery date rather than in 
terms of size, we're going to mightily confuse the inhabitants of the 
planet Zog when they get our list. In any case, the precedent has 
already been set.

We can talk unambiguously about "the 37th Mersenne prime to be 
discovered" meaning "Clarkson's Number" or 2^3021377-1, but we should 
not talk about the "37th Mersenne prime" (unqualified) until we have 
double-checked all the exponents up to 3021377.

I personally find it hard to think in terms of a 38th Mersenne prime 
discovery until it's verified. I know it's most unlikely (to put it 
mildly) that a "false positive result" would arise by chance, but, as 
they say, "If it looks like a duck and sounds like a duck, it 
probably _is_ a duck. But I won't be sure until I see it has webbed 
feet!"

Sorry for being pedantic.

> Since the code is called (approx.) 500,000 times _per iteration_, and the
> FPU unit has latency of at least 2-3 cycles per instruction (correct me),
> I guess decoding stalls is only minor here, even though the function is
> inlined, so it has to be decoded many times.

Latency is the time between the instruction entering the pipeline and 
the operation being complete. However the throughput is 1 floating-
point add or multiply per cycle, so long as the code can keep the 
pipeline filled.

If the processor stalls, the execution units empty and you _do_ lose 
(at least) the latency period before you start to get results out 
again. This is why avoiding stalls is important.

Regards
Brian Beesley
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Re: Mersenne Digest V1 #572

Reply via email to