Mersenne Digest      Tuesday, September 28 1999      Volume 01 : Number 634




----------------------------------------------------------------------

Date: Sun, 26 Sep 1999 20:13:37 +0200
From: Harald Tveit Alvestrand <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Front-end design

What would make sense to me is to have a single (X-based?) frontend that is 
able to monitor/control the action of multiple Prime95-style "services".

Today, I look at my personal Primenet report to see if any of my machines 
are "misbehaving" - but finding a turned-off computer using this takes days.

A screen with a few nice colors per machine would be *so* much nicer .-)

How I would implement it:
Have two processes:
- - 1 like the "proxy", except different: all the mprime programs send it a 
packet every time they start up (and occasionally when running). Location 
configurable (can't see how to make it auto-conf, except perhaps if the 
primenet server would store its location in my account info and tell the 
clients when they call in for an exponent or update). Always runs.
- - 1 display program, that either interrogates the "proxy" or the running 
testers directly through an RPC mechanism.

Much fun. Lots of code to break.

                 Harald A



- --
Harald Tveit Alvestrand, Maxware, Norway
[EMAIL PROTECTED]

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 26 Sep 1999 14:42:26 -0500
From: Ken Kriesel <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Factors Everywhere

At 11:46 AM 1999/09/26 +0100, you wrote:
>6) PrimeNet trial factoring has moved well into the 10 millions, 
>however George's factor.zip file contains data only up to 10 million. 
>I know there is a problem with uploading the large files concerned; 
>hopefully, the suggestions above will help to reduce the size of the 
>file, sufficiently to make it feasible to expand the exponent range 
>to cover (at least) the active Primenet assignments for long enough 
>for cheap, high-speed Internet connectivity to become widespread.
>
>7) I have several hundred megabytes of disk space available on my ftp 
>server, which has reasonable Internet access - at least 8 Mbps 
>connectivity to the Internet core - and would be happy to provide 
>means for anyone interested to upload factoring data (or anything 
>else, strictly relevant to Mersenne or closely-related numbers) for 
>the purpose of making it publicly available.

I encourage you to implement this central repository.

Something that would have accelerated parts of the QA process is the 
ability to query such an online resource for information such as the 
following:

What is the smallest exponent larger than n that has no factor found
when factored to 2^m (possibly but not necessarily gimps default depth)?
What is the largest exponent smaller than n that has no factor found
when factored to 2^m?
What are the exponents with double checked residues between a and b?
What are the exponents with double checked residues between a and b
whose residues are from separate chip architectures and software
(Eg, Intel & prime95 for one, Alpha & MacLucasUnix for the other)?
What are the exponents with singly checked residues between a and b?

(I suppose one could consider doing very short factoring tests to fill
gaps when queries ask for an exponent's factor status; something like
up to 2^40, and add it to the database at that time; not a requirement.)

Something that might aid development of new prime search software 
would be:

What are all the known factors larger than 2^m?
What are all the known factors between 2^m and 2^n for exponents between
a and b?

It is not the raw data, so much as the query response, that needs
human-readability.


Ken

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 26 Sep 1999 14:30:38 -0700
From: Will Edgington <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Factors Everywhere

Will Edgington writes:

[Yes, I'm following up to my own message.:)]

   n
   p,pk1
   ,pk2
   ,pk3

   Note that M(n) has no known factors.

Trying this out just now, the 111 MB of data that I have for prime
exponent Mersennes in the mersfmt reduces to a bit under 20 MB if this
format is used for only the prime exponents with known factors.  Gzip
(using max compression) gets that down to 7.1 MB.

Producing & gzip'ing it from the mersfmt of the data takes only a
couple of minutes on my machine, so adding it to the automatic update
won't slow things down enough to notice.

A quick line count of ungzip'd file (about 2 seconds, from cache:)
produces a count of 1,677,377 known factors of prime exponent Mersenne
numbers (that aren't completely factored).

                                                        Will
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 26 Sep 1999 17:52:19 EDT
From: [EMAIL PROTECTED]
Subject: Mersenne: Re: Mlucas 2.7 for x86?

[EMAIL PROTECTED] (Guillermo Ballester Valor) writes:

>The first timing is no good as I expected. These are the result:
>
>F90 compiler : Microsoft Fortran Power station 4.0 
>   Optimizations: all optimizations enabled.
>
>                           mprime      lfftw      Mlucas
>                           (sec/iter)
>Exponent test M(3975659)    0.359      0.901       1.604

I'm not too surprised at this. Since my code appears to be faster than
FFTW on most high-end CPUs, that tells me that FFTW is probably optimized
more for the x86 (very few FP registers) than mine, which is geared toward
hardware with at least 32 FPR's. Normally this means that one uses smaller
complex FFT radices (say, 4 or 8) on machines with 8-16 FPR's, but Jason
Papadopoulos tells me that FFTW uses radices as high as 32. Perhaps they
use conditional compilation to decide what radices to use depending on the
underlying hardware, and use smaller radices on x86. (Any insights, Jason?)

If they do use radices >= 8 on x86, they are probably arranging the code
to minimize register pressure - this could be worth looking at.

>THE CLOCK TIMINGS WRITTEN BY MLUCAS ARE INCORRECT!. I had to timing with
>my hand-clock. It writes a lot more time than real.

Note that Mlucas uses elapsed time rather than CPU time, so if other stuff
is running, the printed time would be larger than CPU. But if your own
elapsed-time measure disagrees, that implies there is a bug with the f90
date_and_time intrinsic in the MS compiler - try to code a super-simple
program (hacked from mine, if you like - look in the source to see how
the character function char_time gets used in conjunction with the above
intrinsic. If you can reproduce the wrong-timing effect, send e-mail and
your sample code to MS compiler support.

I've had no problems with the time stuff on various Unix systems.

>If the rest of the lfftw code have a similar performance, you perhaps
>will reach a RPI>100%.

Oh, I already do, just not on every platform - but I'm working on it. :)

>I can sent you the executable Mlucas and my tested, no buggy, c-code.
>lfftw.c.

Sure, go ahead and send them, preferably Win-or-PK-zipped. You should
make at least the C code ftp'able as well, that way others can check it
out. If it's close to MacLucasUNIX on some systems, we would at least 
have a generic C code which allows non-power-of-2 runlengths (MLU doesn't).

>Well, sorry to mail you on Sunday.

I don't mind - I only check my e-mail if I have time (and on weekends,
the inclination) to do so.

Best regards,
Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 26 Sep 1999 18:36:09 -0400 (EDT)
From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Mlucas 2.7 for x86?

On Sun, 26 Sep 1999 [EMAIL PROTECTED] wrote:

> I'm not too surprised at this. Since my code appears to be faster than
> FFTW on most high-end CPUs, that tells me that FFTW is probably optimized
> more for the x86 (very few FP registers) than mine, which is geared toward
> hardware with at least 32 FPR's. Normally this means that one uses smaller
> complex FFT radices (say, 4 or 8) on machines with 8-16 FPR's, but Jason
> Papadopoulos tells me that FFTW uses radices as high as 32. Perhaps they
> use conditional compilation to decide what radices to use depending on the
> underlying hardware, and use smaller radices on x86. (Any insights, Jason?)
> 
> If they do use radices >= 8 on x86, they are probably arranging the code
> to minimize register pressure - this could be worth looking at.

FFTW uses a recursive FFT, and includes code that performs a single
radix-n pass for lots of n (all powers of two up to 64, and other
small radices like 3,5,7,9,10,...). You tell it to build you an FFT
of a certain size, and it picks the combination of radices that solve
your problem in minimum time (it uses dynamic programming, finding the
fastest combination for small sizes and then using them as building 
blocks for larger sizes). The combination is encoded in a "plan",
and the FFTW executor reads this when you want to do the FFT for real.

On x86, for medium-size transforms FFTW likes radix-8 a lot. It overflows
the pathetic FPU stack quickly but it also cuts the problem size very
quickly, and so turns out to be a net win over smaller radices.

jasonp

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 26 Sep 1999 16:50:18 -0700
From: Bob Margulies <[EMAIL PROTECTED]>
Subject: Mersenne: File missing?

I am running Prime95 Version 18.1.1, dated 04-01-99. When I press Esc,
in order to suspend execution, I get a message stating that Prime95 is
unable to find a file called Prime95.hlp. This file is not in the
Prime95.zip which I downloaded. What is wrong?
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 01:13:25 +0100
From: "Ian L McLoughlin" <[EMAIL PROTECTED]>
Subject: Mersenne: Error message in Ver.18

Hi,
Still a bit green on all this ...
I have seen an error message I am not familiar with (i.e. not illegal
sum-out)
But:
SUM(INPUTS) ! = SUM (OUTPUTS),
669721378183728.3 = 1.130101692988369e +300
possible hardware failure...etc.I see no refernce to this in the Readme file
(as opposed to illegal sumout above)
Any enlightenment on this would be most appreciated!!??

All The Best,

Ian McLoughlin, Chematek U.K.

Tel/Fax : +44(0)1904 679906
Mobile   : +44(0)7801 823421
Website: www.chematekuk.co.uk

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 26 Sep 1999 21:44:33 EDT
From: [EMAIL PROTECTED]
Subject: Mersenne: free f90/C/ for Alpha

Dear all:

For those of you wanting to try Mlucas 2.7x (or any other Fortran code) on 
Alpha
under Linux, Compaq is offering free betas of their f90 compiler for Linux. 
The
same site lists free betas of the Compaq Linux C compiler, the Compaq Portable
Math Library, and says there will soon be a beta of a C++ compiler for Linux:

www.unix.digital.com/linux/software.htm

I don't run Linux on either of my Alphas yet (If anyone has advice on 
relatively
painfree and hopefully $-free ways of installing Linux on an Alpha, without 
blowing
away an existing Unix configuration - I'm all ears), but the Linux executable
of Mlucas v2.6 my SPEC98 contact at Compaq provided me earlier this year
(compiled using the aforementioned compiler) was as fast as the f90-for-Unix-
compiled one.

- -Ernst
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 04:22:53 -0700
From: Paul Leyland <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Re: FFTW for GIMPS?

> From: Olivier Langlois [mailto:[EMAIL PROTECTED]]
> I've played a little bit FFTW and I've remarked that its performance can
> vary a lot depending on how good your compiler is at optimization.
> 
> For instance, compiled FFTW code is far from optimal with MSVC++ 6. This
> compiler doesn't fully use the FPU stack like an ASM programmer could do
and
> I don't know why since I'm sure that writing a compiler that would make a
> good usage of the FPU registers is far from impossible.
> 
> So, the compiler you use to compile FFTW is a major factor for the
> performance you'll get from it.
> 
> I don't know if someone have done similar experiences and if there is a
> better compiler than MSVC for intensive FP code.

Actually, we at Microsoft Research in Cambridge have seen similar effects
when compiling and running FFTW code.  Our discovery is that the alignment
of FP data values is critical.  Get it wrong, and performance can plummet.
Unless you set the alignment explicitly, it will be wrong approximately half
the time.

Jonathan Hardwick investigated this effect as part of his research into
high-performance computing.  He gave an internal seminar (which is where I
learned about it) and wrote it up in detail.  The full details are at
http://www.research.microsoft.com/users/jch/fftw-performance.html


Paul

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 16:02:38 +0100
From: Robin Stevens <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Linux error 2250 (was Front-end design)

On Thu, Sep 23, 1999 at 05:45:39PM -0400, George Woltman wrote:
> >Incidentally, can anyone explain why under v19.0.2 I'm getting "ERROR 2250:
> >Server unavailable" messages? 
> Someone told me that glibc-2.1 (as compared to v18's libc5) uses different
> files or network setup or something.  I am a Linux know-nothing, so perhaps
> a list member can enlighten all of us.

I'm no expert I'm afraid, but I've done a little more investigating.
I grabbed the source and recompiled primenet.c with the _DEBUG option.
Of course it now works fine (except it insisted on sending off a load of
old results going back as far as May, which according to the logs had
definitely been reported before).  So I've got the results updated, but
without finding the cause of the problem...  Oh well, it now works, so I've
put v19 on all machines of mine (pity I had to kill off a process with over
100 days' CPU time to its credit (-: ).

One point - there seem to be inconsistencies in the case of some of the
source file names, which required either manual changing or use of
a FAT partition in order to get mprime to compile :-)
- -- 
- -------------------- Robin Stevens <[EMAIL PROTECTED]> -------------------- 
Merton College, Oxford OX1 4JD, UK   http://www-astro.physics.ox.ac.uk/~rejs/ 
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 18:01:59 +0200
From: Laurent Desnogues <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Mlucas 2.7x on SPARC

[EMAIL PROTECTED] wrote:
> 
> << I tried all sorts of compiler flags - unfortunately, the optimization
> flags are not linear, especially -O5 tends to produce much slower code
> than -O4 when combined with other flags. >>
> 
> I see similar weird slowdowns using the -O5 compile option on some (not
> all) Alpha CPUs (generally the older ones.) I wonder if both compilers
> are doing similar "optimizations" at -O5.

   The Sun C compiler -O5 flag should only be used when using
a profile to direct subsequent compilations...  The way to use
it is to compile with -xprofile=collect then run then recompile
with -xprofile=use...

   This might be something similar for Alpha.

   However some optimizations done at higher levels of
optimization might produce slower code.  An example is too much
loop unrolling producing code that does not fit well in L1
I-Cache.

> << I'm using -fast -libmil -xlibmopt -fnsyes now, which seems to give
> close to optimal performance. >>
[...]
> << I dont know whether this is also optimal on other types of UltraSparc, I
> only have Ultra60s for testing. >>

   This won't be optimal if you run under Solaris < 7!  Under
such OSes the -fast flag must be followed by a -xarch=v8plus in
order to use all 32 double FP regs of an UltraSPARC chip.  This
is not the case for a Solaris 7 system where -fast will use
- -xarch=v9.

   Flags to also test are:

        -xdepend
        -xinline=all
        -xsafe=mem (to be used with -xO5)

Good luck ;)


                Laurent
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 18:13:25 +0200
From: Laurent Desnogues <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Re: Mlucas 2.7x on SPARC

[EMAIL PROTECTED] wrote:
> 
> The SPARC binary Alex Kruppa sent me of Mlucas 2.7x is at my ftp site:
> 
> ftp://209.133.33.182/pub/mayer/README
> ftp://209.133.33.182/pub/mayer/bin/SPARC/Mlucas_2.7x.exe.gz
[...] 
> I don't even know if the above runs on a machine that doesn't have an f90
> compiler installed (i.e. whether the code needs any f90-specific RTL files)-
> I don't think it does, but anyone with an f90-less SPARC can easily find out.

   No, it indeed does not:

% ldd Mlucas2.7x.exe
        libfui.so.1 =>   (file not found)
        libfai.so.1 =>   (file not found)
        libfai2.so.1 =>  (file not found)
        libfsumai.so.1 =>        (file not found)
        libfprodai.so.1 =>       (file not found)
        libfminlai.so.1 =>       (file not found)
        libfmaxlai.so.1 =>       (file not found)
        libfminvai.so.1 =>       (file not found)
        libfmaxvai.so.1 =>       (file not found)
        libfsu.so.1 =>   (file not found)
        libsunmath.so.1 =>       /opt/SUNWspro/lib/libsunmath.so.1
        libm.so.1 =>     /usr/lib/libm.so.1
        libc.so.1 =>     /usr/lib/libc.so.1
        libc.so.1  =>       (version not found)
        libdl.so.1 =>    /usr/lib/libdl.so.1

Please also note the SYSVABI_1.3:  I guess it means the executable
can only be run on a SPARC station running Solaris 7!


                Laurent
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 18:19:57 +0200
From: Laurent Desnogues <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Re: Mlucas 2.7x on SPARC

Laurent Desnogues wrote:
> 
[wrong list...]
>
> Please also note the SYSVABI_1.3:  I guess it means the executable
> can only be run on a SPARC station running Solaris 7!

   I hate cut & paste :(

% ldd Mlucas2.7x.exe
        libfui.so.1 =>   (file not found)
        libfai.so.1 =>   (file not found)
        libfai2.so.1 =>  (file not found)
        libfsumai.so.1 =>        (file not found)
        libfprodai.so.1 =>       (file not found)
        libfminlai.so.1 =>       (file not found)
        libfmaxlai.so.1 =>       (file not found)
        libfminvai.so.1 =>       (file not found)
        libfmaxvai.so.1 =>       (file not found)
        libfsu.so.1 =>   (file not found)
        libsunmath.so.1 =>       /opt/SUNWspro/lib/libsunmath.so.1
        libm.so.1 =>     /usr/lib/libm.so.1
        libc.so.1 =>     /usr/lib/libc.so.1
        libc.so.1 (SYSVABI_1.3) =>       (version not found)
        libdl.so.1 =>    /usr/lib/libdl.so.1


                Laurent
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 00:30:55 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Factors Everywhere

On Sun, Sep 26, 1999 at 10:30:44AM -0700, Will Edgington wrote:
>I, personally, have no way of producing executables except for Intel
>CPUs, and presently only for Linux (I just yesterday got a old P100
>machine up running Win98 (and Prime95, of course:)).

Cross-compilation is a nice thing. Get a cross-version of gcc and
binutils, and you should be set :-)

If you have the source (get it at gcc.gnu.org), you can even build
one yourself. It's not very hard either (I think...).

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 18:56:42 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: File missing?

On Sun, Sep 26, 1999 at 04:50:18PM -0700, Bob Margulies wrote:
>I am running Prime95 Version 18.1.1, dated 04-01-99. When I press Esc,
>in order to suspend execution, I get a message stating that Prime95 is
>unable to find a file called Prime95.hlp. This file is not in the
>Prime95.zip which I downloaded. What is wrong?

You're sure you didn't hit F1 instead of Esc?

There is no help file for Prime95 -- I guess George intended to make
one once, but he never did.

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 18:58:24 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: Linux error 2250 (was Front-end design)

On Mon, Sep 27, 1999 at 04:02:38PM +0100, Robin Stevens wrote:
>I'm no expert I'm afraid, but I've done a little more investigating.
>I grabbed the source and recompiled primenet.c with the _DEBUG option.
>Of course it now works fine (except it insisted on sending off a load of
>old results going back as far as May, which according to the logs had
>definitely been reported before).

Note that if you compile mprime/Prime95 yourself, your security code will
be set to all zeroes, and you will not receive credit for your work.

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 19:02:50 +0200
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: FFTW for GIMPS?

On Mon, Sep 27, 1999 at 04:22:53AM -0700, Paul Leyland wrote:
>Actually, we at Microsoft Research in Cambridge have seen similar effects
>when compiling and running FFTW code.  Our discovery is that the alignment
>of FP data values is critical.

It is generally for _all_ FP code. Unfortunately, the ABI for x86 `gets it
wrong', and uses too little alignment.

For gcc/egcs, try -malign-double if you don't need ABI compatibility (ie.
you compile all libraries containing structures with floats with -malign-
double). pgcc (a gcc/egcs derivative) has a way of aligning the stack,
so that the floats will be aligned by 8 most of the time, without breaking
the ABI. Check http://www.goof.com/pcg/

Note that Prime95 already has correct alignment; would you expect anything
else from George? ;-)

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 21:44:25 +0200
From: Guillermo Ballester Valor <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Re: FFTW for GIMPS?

Hi:

Paul Leyland wrote:
> Actually, we at Microsoft Research in Cambridge have seen similar effects
> when compiling and running FFTW code.  Our discovery is that the alignment
> of FP data values is critical.  Get it wrong, and performance can plummet.
> Unless you set the alignment explicitly, it will be wrong approximately half
> the time.

Your right, I gained a 35% performance only with doing a simple trick to
be sure there were a 8-bytes alignement. On the other hand, I made the
FFTW library using long double float type (with a 'awful' 10-bytes long)
and the performance was near 65% in comparison with double float type
performance.


| Guillermo Ballester Valor       |  
| [EMAIL PROTECTED]                      |  
| c/ cordoba, 19                  |
| 18151-Ogijares (Spain)          |
| (Linux registered user 1171811) |
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Sep 1999 15:56:38 -0400 (EDT)
From: "St. Dee" <[EMAIL PROTECTED]>
Subject: Mersenne: mprime V19--correct behavior or glitch?

Hi,

I just updated several Linux machines to mprime V19.  I have all of my
machines set to get 45 days worth of work.  Two of the machines, which
were nearly down to having only 45 days worth of work remaining,
immediately contacted PrimeNet, got an additional exponent each, and
factored that exponent to 64.  After factoring, each machine returned to
performing an LL test on the exponent it was working on prior to the
upgrade.  So far, so good--this behavior is consistent with prior versions
of mprime.  This morning I awoke to find that each machine contacted
PrimeNet overnight and unreserved the newly factored exponent, leaving
each machine with about 40 days of work remaining.  Why is this?
Shouldn't each machine have kept the recently factored exponent in order
to perform LL tests on that exponent in turn?  Why would they return
exponents when doing so reduces the days of work remaining below the limit
I set?

Has anyone else noticed such behavior?

Just curious,
Kel

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Sep 1999 19:43:17 +0100
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: Re: FFTW for GIMPS?

On 27 Sep 99, at 4:22, Paul Leyland wrote:

> Actually, we at Microsoft Research in Cambridge have seen similar effects
> when compiling and running FFTW code.  Our discovery is that the alignment
> of FP data values is critical.  Get it wrong, and performance can plummet.
> Unless you set the alignment explicitly, it will be wrong approximately half
> the time.

So, is a future release of MSVC++ going to include an option to 
optimize alignment of FP data values, at the expense of minimizing 
storage by packing data values as tight as possible?

I think this mainly applies to quadword operands (doubles in C) which 
should be aligned on a 8-byte boundary, so that one memory bus cycle 
is sufficient. This strategy also avoids operands spanning cache line 
boundaries, which would likely have a serious effect on performance 
by effectively halving the associativeness of the L1 data cache.

Alignment on 4-byte boundaries is quite sufficient for C floats. Ten-
byte reals (direct copies from FPU registers) are a problem, you are 
always going to need two memory bus cycles since you can't fit an 80-
bit operand on a 64-bit bus. However, whether you pack a ten-byte 
real array contiguously (with no wasted space), or align elements on 
16-byte boundaries (with lots of wasted space, but no cache line 
conflicts) could have a significant effect on performance - which 
might work in different directions in different applications.




Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #634
******************************

Reply via email to