Mersenne Digest V1 #794

Mersenne Digest Sun, 26 Nov 2000 13:57:01 -0800

Mersenne Digest       Sunday, November 26 2000       Volume 01 : Number 794




----------------------------------------------------------------------

Date: Fri, 17 Nov 2000 14:21:23 -0500
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Shortage on double-check exponents

Hi all,

5 PM 11/17/00 +0100, Canart, Jean-Yves wrote:
>Since a few days, there are no more available exponents for double-checking.

Sorry 'bout that.  I've made 9600 new exponents available.

Those doing first -time tests on exponents between 6,100,000 and
6,500,000 are *still* doing first time tests even though the primenet
server will now report them as double-checking assignments.

Regards,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Fri, 17 Nov 2000 16:34:16 -0500
From: Matt Gauthier <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Mersenne Digest V1 #793 

>distributed.net is a simpel brute force, for the mathematics behind mersenne.o
>rg see: http://www.mersenne.org/math.htm
>
>ok,ok, there is no use for such primes, but mathematics is - like life itself 
>-  l'art pour l'art anyway.

Apart from RC5, and the other inane crypto contests, distributed.net
is also searching for the optimal 24 and 25 mark Golomb rulers. So,
there is a purely mathematical project there too.

- --
Matt Gauthier <[EMAIL PROTECTED]>
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Sun, 26 Nov 2000 16:32:45 -0500
From: George Woltman <[EMAIL PROTECTED]>
Subject: Mersenne: P4

Hi all,

        The mailing list has been quiet.  I hope everyone enjoyed
a happy Thanksgiving (or at least a good weekend for non-U.S. readers).

        I've received 2 queries about the recently released Pentium 4
and prime95.  I have no timings at this point, but I figured some folks
would like to know how the architecture helps or hurts our cause.  I've
downloaded the manuals and have the following observations:

1)   The FPU is slower than the P-III.  An FADD is 5 clocks (vs. 3 on
P-III and 4 on Athlon).  An FMUL is 7 clocks (vs. 5 on P-III and 4 on
Athlon).  An FADD can be issued every clock cycle.  An FMUL can be
issued every other clock cycle (same as P-III, Athlon can issue every
clock cycle).  An FADD and FMUL cannot be started on the same clock
cycle (same as the P-III, no such restriction on Athlon).

Summary:  On paper, Athlon still has the best FPU, P-III next, P4 last
(but remember the P4 has the faster clock).

2)  The P4 introduces SSE2 instructions.  Intel hopes new programs
stop using the old FPU instructions and start using these new instructions.
The SSE2 instructions work on 2 floating point values at the same time!
An ADD takes 4 clocks, but can only issue every other clock cycle.  A
MUL takes 6 clocks and also can be issued every other clock cycle.

The theoretical maximum throughput for SSE2 is one ADD *AND* one
MUL every clock cycle.  The average latency is 2 for a ADD and 3 for
a MUL.

Summary:  If a program can be effectively recoded to use SSE2,
then it can have greater throughput than even the Athlon.  Of course,
months ago I had hoped that the P4 would be able to get a throughput
of 2 ADDs and 2 MULs per clock cycle.  Maybe in a few years, a
future P4 or AMD chip will do this.

3)  The memory layout in the P4 has changed greatly.  This will require
careful recoding in prime95.

a)  The L1 data cache has been reduced to 8K - the same size as the
original Pentium.  The L1 cache is faster than previous chips (2 clocks
vs. 3 clocks).

b)  The L1 cache has 64 byte cache lines.  This means fewer cache line
loads from the L2 cache.

c)  The L2 cache is very fast:  7 clcoks.  I don't know the number for
previous chips, but I'm sure it was higher.

d)  The L2 cache has 128-byte cache-lines, meaning few cache line
loads from main memory.

e)  There are instructions to prefetch cache lines from main memory
into the L2 cache.  The P-III had a similar instructions but Prime95
never used them.

f)  Initial P4 machines will use RDRAM.  Without getting into the Rambus
holy war, P4 machines will generally have higher memory bandwidth but
access time to the first byte of a memory request is slower.

Summary:  With proper recoding, prime95 may make significant gains.
Proper use of the prefetch instructions can hide most of the waiting
on main memory that you see with the current version.  Furthermore,
there are several places in prime95 where I chose to tradeoff extra
floating point operations in order to reduce memory accesses.  Using
more memory on the P4 (and fewer floating point operations) will
probably make more sense.


Finally, I've ordered a P4 which should arrive in mid-December.  I'll
post benchmarks then.  I expect a new P4 optimized version of
prime95 will take several months to develop.

Regards,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

End of Mersenne Digest V1 #794
******************************
Mersenne Digest V1 #794

Reply via email to