Mersenne Digest Sunday, November 26 2000 Volume 01 : Number 794 ---------------------------------------------------------------------- Date: Fri, 17 Nov 2000 14:21:23 -0500 From: George Woltman <[EMAIL PROTECTED]> Subject: Re: Mersenne: Shortage on double-check exponents Hi all, 5 PM 11/17/00 +0100, Canart, Jean-Yves wrote: >Since a few days, there are no more available exponents for double-checking. Sorry 'bout that. I've made 9600 new exponents available. Those doing first -time tests on exponents between 6,100,000 and 6,500,000 are *still* doing first time tests even though the primenet server will now report them as double-checking assignments. Regards, George _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt ------------------------------ Date: Fri, 17 Nov 2000 16:34:16 -0500 From: Matt Gauthier <[EMAIL PROTECTED]> Subject: Mersenne: Re: Mersenne Digest V1 #793 >distributed.net is a simpel brute force, for the mathematics behind mersenne.o >rg see: http://www.mersenne.org/math.htm > >ok,ok, there is no use for such primes, but mathematics is - like life itself >- l'art pour l'art anyway. Apart from RC5, and the other inane crypto contests, distributed.net is also searching for the optimal 24 and 25 mark Golomb rulers. So, there is a purely mathematical project there too. - -- Matt Gauthier <[EMAIL PROTECTED]> _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt ------------------------------ Date: Sun, 26 Nov 2000 16:32:45 -0500 From: George Woltman <[EMAIL PROTECTED]> Subject: Mersenne: P4 Hi all, The mailing list has been quiet. I hope everyone enjoyed a happy Thanksgiving (or at least a good weekend for non-U.S. readers). I've received 2 queries about the recently released Pentium 4 and prime95. I have no timings at this point, but I figured some folks would like to know how the architecture helps or hurts our cause. I've downloaded the manuals and have the following observations: 1) The FPU is slower than the P-III. An FADD is 5 clocks (vs. 3 on P-III and 4 on Athlon). An FMUL is 7 clocks (vs. 5 on P-III and 4 on Athlon). An FADD can be issued every clock cycle. An FMUL can be issued every other clock cycle (same as P-III, Athlon can issue every clock cycle). An FADD and FMUL cannot be started on the same clock cycle (same as the P-III, no such restriction on Athlon). Summary: On paper, Athlon still has the best FPU, P-III next, P4 last (but remember the P4 has the faster clock). 2) The P4 introduces SSE2 instructions. Intel hopes new programs stop using the old FPU instructions and start using these new instructions. The SSE2 instructions work on 2 floating point values at the same time! An ADD takes 4 clocks, but can only issue every other clock cycle. A MUL takes 6 clocks and also can be issued every other clock cycle. The theoretical maximum throughput for SSE2 is one ADD *AND* one MUL every clock cycle. The average latency is 2 for a ADD and 3 for a MUL. Summary: If a program can be effectively recoded to use SSE2, then it can have greater throughput than even the Athlon. Of course, months ago I had hoped that the P4 would be able to get a throughput of 2 ADDs and 2 MULs per clock cycle. Maybe in a few years, a future P4 or AMD chip will do this. 3) The memory layout in the P4 has changed greatly. This will require careful recoding in prime95. a) The L1 data cache has been reduced to 8K - the same size as the original Pentium. The L1 cache is faster than previous chips (2 clocks vs. 3 clocks). b) The L1 cache has 64 byte cache lines. This means fewer cache line loads from the L2 cache. c) The L2 cache is very fast: 7 clcoks. I don't know the number for previous chips, but I'm sure it was higher. d) The L2 cache has 128-byte cache-lines, meaning few cache line loads from main memory. e) There are instructions to prefetch cache lines from main memory into the L2 cache. The P-III had a similar instructions but Prime95 never used them. f) Initial P4 machines will use RDRAM. Without getting into the Rambus holy war, P4 machines will generally have higher memory bandwidth but access time to the first byte of a memory request is slower. Summary: With proper recoding, prime95 may make significant gains. Proper use of the prefetch instructions can hide most of the waiting on main memory that you see with the current version. Furthermore, there are several places in prime95 where I chose to tradeoff extra floating point operations in order to reduce memory accesses. Using more memory on the P4 (and fewer floating point operations) will probably make more sense. Finally, I've ordered a P4 which should arrive in mid-December. I'll post benchmarks then. I expect a new P4 optimized version of prime95 will take several months to develop. Regards, George _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt ------------------------------ End of Mersenne Digest V1 #794 ******************************
