Mersenne Digest       Wednesday, January 3 2001       Volume 01 : Number 806




----------------------------------------------------------------------

Date: Sun, 31 Dec 2000 18:18:37 -0600
From: Nathan Russell <[EMAIL PROTECTED]>
Subject: Mersenne: Shutdown script for mprime

Hi,

I was wondering if anyone knows of a script to shut down mprime and make
sure the save files are finished saving before SIGKILL is sent in the
shutdown process.  

In terms of starting it up, I now have a line in my inittab file to
start it whenever the system enters runlevel 5, and run it with runtime
options to use the Prime95 directory of the Windows drive as its working
directory.  It is set up to run setuid a user, and I've checked and it
is indeed running as that user.  

Thanks!  

Nathan
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 31 Dec 2000 18:20:42 -0500
From: [EMAIL PROTECTED]
Subject: Mersenne: Automatic reply

I have moved.  My new address is:

[EMAIL PROTECTED]

Please use this address in the future.

Thanks


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 1 Jan 2001 00:46:48 +0100 (MET)
From: [EMAIL PROTECTED]
Subject: Re:  Mersenne: 2000 - first calender year without a Mersenne

    Nathan Russell observes:
> Well, unless there is an announcement within the next few hours, 2000
> will be the first calender year in the history of GIMPS without a
> Mersenne prime.  
> 
> Is the number of searchers, and the power of their hardware, increasing
> enough to keep up with the increased runtimes and (slightly) decreased
> chance of a given exponent being prime?  Or can we expect only one prime
> every several years, as was the case before the start of the GIMPS
> effort?  
 
      The chance that a given Mp is prime is O(1/log(Mp)) = O(1/p).
The LL test for Mp needs O(p) iterations.  
The time per iteration (using an FFT) is about O(p*log(p)).  
So the estimated time per LL test is O(p^2 * log(p)) 
whereas the chance of success on one LL test is O(1/p).

      We need a few adjustments to the formula since (for example)
the time to trial divide by primes < N is approximately 
O(log(p) /p) for fixed N.  [We test O(N/p) divisors below N, 
with a cost of O(log(p)) per divisor.]  This trial division
time decreases with p, whereas LL time increases, which
affects our strategy.

      Nonetheless, to test all O(2^b / b) primes with b bits,
we estimate O(2^b / b) * O(2^(2b) * b) = O(8^b) operations
with O(1) expected successes.  Doing all 24-bit p's 
takes about eight times as long as doing all 23-bit p's.
Even with more contributors and faster hardware, 
we soon get diminishing returns.

       Peter Montgomery


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 31 Dec 2000 23:10:35 -0500
From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: 2000 - first calender year without a Mersenne

- ----- Original Message -----
From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]>
To: "Nathan Russell" <[EMAIL PROTECTED]>
Sent: Sunday, December 31, 2000 7:53 PM
Subject: Re: Mersenne: 2000 - first calender year without a Mersenne


> Even with hardware twice as fast, any current two year exponent would
still
> take a year at that time.
> I don't think the rate of double the speed every two years can hold out
for
> ever.
> Even if the next hardware can get through these exponents faster, the
longer
> exponents would come down the pipe sooner, requiring the next hardware
jump
> that much sooner.
>
> Regards Frank
> ----- Original Message -----
> From: "Nathan Russell" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Sunday, December 31, 2000 7:01 PM
> Subject: Mersenne: 2000 - first calender year without a Mersenne
>
>
> > Well, unless there is an announcement within the next few hours, 2000
> > will be the first calender year in the history of GIMPS without a
> > Mersenne prime.
> >
> > Is the number of searchers, and the power of their hardware, increasing
> > enough to keep up with the increased runtimes and (slightly) decreased
> > chance of a given exponent being prime?  Or can we expect only one prime
> > every several years, as was the case before the start of the GIMPS
> > effort?
> >
> > This might be a good discussion, since the list is fairly quiet now.
> >
> > Nathan
> >
_________________________________________________________________________
> > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> > Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 31 Dec 2000 23:10:23 -0500
From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: 2000 - first calender year without a Mersenne

- ----- Original Message -----
From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, December 31, 2000 7:59 PM
Subject: Re: Mersenne: 2000 - first calender year without a Mersenne


> Sorry, I replied to Mr Russell before reading Mr Montgomery's reply.
> Mr M said it much better than I.
>
> Reagards Frank
> ----- Original Message -----
> From: <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Sunday, December 31, 2000 6:46 PM
> Subject: Re: Mersenne: 2000 - first calender year without a Mersenne
>
>
> >
> >     Nathan Russell observes:
> > > Well, unless there is an announcement within the next few hours, 2000
> > > will be the first calender year in the history of GIMPS without a
> > > Mersenne prime.
> > >
> > > Is the number of searchers, and the power of their hardware,
increasing
> > > enough to keep up with the increased runtimes and (slightly) decreased
> > > chance of a given exponent being prime?  Or can we expect only one
prime
> > > every several years, as was the case before the start of the GIMPS
> > > effort?
> >
> >       The chance that a given Mp is prime is O(1/log(Mp)) = O(1/p).
> > The LL test for Mp needs O(p) iterations.
> > The time per iteration (using an FFT) is about O(p*log(p)).
> > So the estimated time per LL test is O(p^2 * log(p))
> > whereas the chance of success on one LL test is O(1/p).
> >
> >       We need a few adjustments to the formula since (for example)
> > the time to trial divide by primes < N is approximately
> > O(log(p) /p) for fixed N.  [We test O(N/p) divisors below N,
> > with a cost of O(log(p)) per divisor.]  This trial division
> > time decreases with p, whereas LL time increases, which
> > affects our strategy.
> >
> >       Nonetheless, to test all O(2^b / b) primes with b bits,
> > we estimate O(2^b / b) * O(2^(2b) * b) = O(8^b) operations
> > with O(1) expected successes.  Doing all 24-bit p's
> > takes about eight times as long as doing all 23-bit p's.
> > Even with more contributors and faster hardware,
> > we soon get diminishing returns.
> >
> >        Peter Montgomery
> >
> >
> >
_________________________________________________________________________
> > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> > Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 01 Jan 2001 12:14:14 +0100
From: Martijn <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Shutdown script for mprime

Nathan Russell wrote:

> Hi,
>
> I was wondering if anyone knows of a script to shut down mprime and make
> sure the save files are finished saving before SIGKILL is sent in the
> shutdown process.
>

Hi I use an sysv type init and start and stop the mersenne
from the rc.d directory.
The script is as follows:

(/etc/rc.d/init.d/mersenne, added to the respective rc.n directories with
chkconfig)

#!/bin/sh
#
# mersenne
#
# chkconfig: 2345 10 50
# description: mersenne
#

# Source function library.
. /etc/init.d/functions


# See how we were called.
case "$1" in
  start)
 action "Starting mersenne: " su - -c /home/mersenne/startmprime mersenne
touch /var/lock/subsys/mersenne
 ;;
  stop)
 action "Stopping Mersenne" killall mprime
 rm -f /var/lock/subsys/mersenne
 ;;
  restart|reload)
 $0 stop;
 $0 start;
 ;;
  *)
 # do not advertise unreasonable commands that there is no reason
 # to use with this device
 echo "Usage: mersenne {start|stop|restart|reload}"
 exit 1
esac

exit 0

the /home/mersenne/startmprime script is as follows:
#!/bin/bash
cd /home/mersenne
./mprime -B -A0
./mprime -B -A1


By killing it relatively early in the shutdown sequence I know for sure that

mprime has enough time to shut down before the generic kills are sent.
(kill and killall sends SIGTERM by default)

Kind Regards, Martijn

>
> In terms of starting it up, I now have a line in my inittab file to
> start it whenever the system enters runlevel 5, and run it with runtime
> options to use the Prime95 directory of the Windows drive as its working
> directory.  It is set up to run setuid a user, and I've checked and it
> is indeed running as that user.
>
> Thanks!
>
> Nathan
> _________________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 1 Jan 2001 16:52:34 -0500 (EST)
From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]>
Subject: Mersenne: Integer Convolution Library v1.1 released

Hello again. Version 1.1 of ICL is a service release; there are minor
speedups, a single major bug fix for Mersenne code and improved test
programs. The build process is now much more generic so that you don't
need the absolute latest GNU tools to compile it. You will still need an
Alpha, however; preferably an ev6.

The code now includes a hacked x86 assembler, whose preprocessor is used
to build the library. Thanks are due to Simon Tatham for NASM, even though
I've cut it to pieces for use here.

www.glue.umd.edu/~jasonp/icl11.tar.gz

Let me know what you think,
jasonp


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 01 Jan 2001 21:01:28 -0500
From: George Woltman <[EMAIL PROTECTED]>
Subject: Mersenne: What I've learned about the P4 thusfar (long)

Happy new year to all,

Current timings on a 512K FFT (exponents up to 10.32 million):

1.4GHz P4:      0.126 sec
1.1GHz T-bird:  0.103 sec
1.0GHz P3:      0.145 sec

As pointed out in an earlier email, using the new SSE2 instructions and
coding for the changed cache line size should allow much better timings
on the P4.

I coded the most common FFT building block, four_complex_fft, using the
SSE2 instructions.  This macro takes 8 floating point values (4 complex
numbers) and does 2 "levels" of the FFT.  The SSE2 implementation will
operate on two different sets of 8 floating point values and also do
2 levels of the FFT (i.e. it does exactly twice as much work).

Next I timed this macro using consecutive memory locations.  The P3 version
of this macro takes 52 clocks.  The P4 new implementation takes 48 clocks!
More than twice the throughput.

Now the bad news, when I use the prime95 memory layout where the 8 input
values come from 8 different cache lines and the modified values are
output to the same cache lines (an in-place FFT), the P4 code now takes 112
clocks.

The cause is the new 64-byte cache line.  The L1 cache is write-back.
Any changes to the L1 cache are written back to the L2 cache.  On the P3
the L1 and L2 cache line size is 32 bytes.  A write to the L2 cache takes
7 clocks (I think).  Thus, the P3 macro will run in 8*7=56 clocks.  The
P4 has a 64 byte L1 cache.  The update of the L2 cache is done with two
writes of 32 bytes each taking 7 clocks.  Thus, the P4 macro takes 8*14=112
clocks.

After much thought (coming up with a scheme that does not cause thrashing
of the 64 TLBs is not easy), I think I've worked out a new memory layout
for prime95.  This is a mostly-in-place FFT where the outputs of the macros
are written to consecutive memory locations.  I've now begun the process
of recoding all of the prime95 building blocks, then I'll code up the
new memory layout for the P4.  Obviously this will take a good deal of time.

I thought you might be interested in a rough estimate of how fast a 512K FFT
might run in the finished version.  The same macro above when operating
out of the L2 cache is a little slower - call it 56 clocks.  The macro
processes 16 doubles, or 56/16=3.5 clocks/double.  The 512K FFT has 19
forward FFT levels and 19 inverse FFT levels.  The first 3 levels are special
and are as expensive as 2 levels later on.  Remember, the macro does 2 levels
meaning we will run it 9 times in the forward FFT and 9 times in the inverse
FFT.  Total cost is 512K doubles * 3.5 clocks/double * 18 / 1.4G clocks/sec.
This is .024 sec.  I've estimated the carry propagation step at .004 sec.
The FFT will also access main memory 3 times, much of this cost can be
hidden with judicious use of the PREFETCH instruction.  This estimate also
does not include the cost of accessing the sin/cos data.  All in all, I
think a time of 0.040 seconds should be achievable.

Having fun,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 1 Jan 2001 22:02:42 -0500 (EST)
From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]>
Subject: Re: Mersenne: What I've learned about the P4 thusfar (long)

On Mon, 1 Jan 2001, George Woltman wrote:

> Now the bad news, when I use the prime95 memory layout where the 8 input
> values come from 8 different cache lines and the modified values are
> output to the same cache lines (an in-place FFT), the P4 code now takes 112
> clocks.
> 
> The cause is the new 64-byte cache line.  The L1 cache is write-back.
> Any changes to the L1 cache are written back to the L2 cache.  On the P3

Two things here; first, doesn't the P4 have 128 byte cache lines? Also,
do you mean write *back* or write *through*? Write back to computer
architects usually means cache lines don't automatically update the cache
level below them.

> P4 has a 64 byte L1 cache.  The update of the L2 cache is done with two
> writes of 32 bytes each taking 7 clocks.  Thus, the P4 macro takes 8*14=112
> clocks.

Are you sure of this timing? The memory bus to L2 is supposed to be 256
bits wide, and is supposed to deliver data once per clock (or maybe every
other clock, the Coppermine P3 does that). This would mean a cache line 
write back would take at least four processor clocks. Or is that
super-speed delivery only supposed to be for reads, with writes being
buffered?

> After much thought (coming up with a scheme that does not cause thrashing
> of the 64 TLBs is not easy), I think I've worked out a new memory layout
> for prime95.  This is a mostly-in-place FFT where the outputs of the macros
> are written to consecutive memory locations.  I've now begun the process
> of recoding all of the prime95 building blocks, then I'll code up the
> new memory layout for the P4.  Obviously this will take a good deal of time.

Since the new L1 cache is tiny and the L2 is very large and almost as
fast, would your life become any easier if you coded for L2 cache latency
instead of L1? Most of the work gets done in a single pass through main
memory, then the "rows" of the transform get processed in a second pass
that can barely tolerate L2 latency. Or is the latency of SSE2
instructions so high that you run out of registers and rely on renaming
already?

jasonp

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 01 Jan 2001 22:44:22 -0500
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: What I've learned about the P4 thusfar (long)

Hi Jason,

At 10:02 PM 1/1/2001 -0500, Jason Stratos Papadopoulos wrote:
> > The cause is the new 64-byte cache line.  The L1 cache is write-back.
> > Any changes to the L1 cache are written back to the L2 cache.  On the P3
>
>Two things here; first, doesn't the P4 have 128 byte cache lines? Also,
>do you mean write *back* or write *through*?

The L2 cache line is 128 bytes, but the L1 cache line is 64 bytes.

I get back/through confused all the time.  When you write to the L1 cache it is
also written to the L2 cache, but not main memory (until the line is
thrown out of the L2 cache).

> > P4 has a 64 byte L1 cache.  The update of the L2 cache is done with two
> > writes of 32 bytes each taking 7 clocks.  Thus, the P4 macro takes 8*14=112
> > clocks.
>
>Are you sure of this timing?

Yes.

>  The memory bus to L2 is supposed to be 256
>bits wide,

256 bits is 32 bytes which is why it takes 2 writes to copy an
L1 cache line to the L2 cache.

>  and is supposed to deliver data once per clock (or maybe every
>other clock, the Coppermine P3 does that). This would mean a cache line
>write back would take at least four processor clocks.

The 7 clock figure came from Tom's Hardware guide which I'm sure is
in the Intel manual somewhere.  It so beautifully described the 112 clock
result I was seeing that I've not looked for the Intel reference.

>  Or is that
>super-speed delivery only supposed to be for reads, with writes being
>buffered?

I'm not sure what the latency is for a read from the L2 cache.  Your
theory could be correct.  I'm sure that Intel engineers want to optimize
the read path more than the write path.

>Since the new L1 cache is tiny and the L2 is very large and almost as
>fast, would your life become any easier if you coded for L2 cache latency
>instead of L1? Most of the work gets done in a single pass through main
>memory, then the "rows" of the transform get processed in a second pass
>that can barely tolerate L2 latency.

My plan is to work through main memory in 128KB chunks.  Within
these 128KB chunks I'll make several passes working on 2KB sub-chunks
to maximize L1 cache hits.

>  Or is the latency of SSE2
>instructions so high that you run out of registers and rely on renaming
>already?

The latency is high, but so far I've not had too much trouble with register
pressure.  The four_complex_unfft macro used 3 temporary memory
locations due to register pressure.  Fortunately, store-forwarding seems
to be keeping the cost at a minimum.

Regards,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 2 Jan 2001 10:44:20 -0000 
From: "Brindle, Gordon" <[EMAIL PROTECTED]>
Subject: Mersenne: Understanding the Propagate Carry Step in the LL Test

Hi Everyone and a Happy New Year to you all,

I wondered if anyone can help me understand (perhaps by way of an example)
how the Propagate Carry step works in the Crandall/Fagin LL test.  I have
browsed the archives of this list but can't find anything that fully answers
my question. I would like to implement a LL test in Mathematica along the
lines of the partial example given by Crandall in Chapter 3 of "Topics in
Advanced Scientific Computation" p94 but I am having difficulty
understanding this step (I've read the Crandall/Fagin 1994 paper, too). I
believe this step is what Crandall refers to as "add-with-carry, immediately
after the weighted convolution" on p96 of his book. My interest is in trying
to gain a mathematical understanding of all the steps for a practical
implementation of the LL test using DWT (so that I can extend Crandall's
example into a fully-fledged LL test, albeit to work with small exponents).

If you can help me to understand this step I should be most grateful,

Regards,

Gordon.

Gordon Brindle
E [EMAIL PROTECTED]

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 03 Jan 2001 17:58:14 EST
From: [EMAIL PROTECTED]
Subject: Mersenne: Microsoft patents zeroes and ones

Check out

http://www.theonion.com/onion3311/microsoftpatents.html

At $0.10 per binary digit, the royalty fee GIMPS would
owe Microsoft for the last Mersenne prime is nearly
$700000. Should we divide this equally amongst GIMPS
participants, or bill people proportionally to their
CPU-year contributions? I hope the fellow whose run
actually discovered the prime hasn't spent all his
EFF prize money yet...

On the plus side, such royalty fees are legitimately tax
deductible as business expenses. As usual, we can count
on Bill Gates to look out for the little guy. Thanks,
Bill!

- -Ernst


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #806
******************************

Reply via email to