Mersenne Digest      Thursday, November 30 2000      Volume 01 : Number 796




----------------------------------------------------------------------

Date: Tue, 28 Nov 2000 15:48:11 -0500
From: "Willmore, David (VS Central)" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: P4 - a correction

George wrote:
>          One correction to my previous post.  I said that the latency to
> access the L1 data cache was 2 clocks.  This is correct for integer
> instructions only.  For floating point and SSE2 instructions the latency
> is 6 clocks!  Interestingly, the L2 cache latency is 7 clocks for both
> integer and floating point instructions.
> 
Look at the coupling that the FPU has to the cache for one reason.  I would
expect
that the FPU(s) have more ports on the L1 than that integer units do.  Also,
if you look
at the sensitivity of different types of code to load latency, integer code,
by far, is
more sensitive than floating point.  Think about the length of the floating
point
pipeline, it's pretty long to start with, so you're gonig to *have* to
unroll your code
to take advantage of the pipeline, so you might as well cover the additional
load to
use latency the same way.  With enough rename registers, it's all good. :)

Cheers,
David
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 16:48:13 -0500
From: George Woltman <[EMAIL PROTECTED]>
Subject: Mersenne: Re: P4 - a correction

Hi,

At 07:45 AM 11/28/00 +0000, Brian J. Beesley wrote:
>Surely the latency is much less relevant than the throughput,
>provided that there are sufficient registers and pipeline entries.

True.  The manuals are unclear as to how many XMM registers are
available.  There are 8 user visible ones and an unknown number
of internal ones that you use via the P4's register renaming feature.

>The idea is to keep the execution units busy ...

I just finshed designing the new code for one of the easier problems -
the rounding and carry propagation step after the inverse FFT.
It looks like the P4 can do this in 6.5 clocks per double vs. about
19 clocks per double on the P3.  This doesn't count any benefits
from prefetching data.

It also seems that the multiply units are idle enough in the new
code so that I can continue to use the compact two-to-fractional-power
arrays, rather than two full 5MB arrays.  Thus, yesterday's question of
33MB max is probably 23MB max for a 640K FFT.  That's good, because
I'd feel guilty using up 33MB.  Heck, I feel guilty using 23MB.

I don't expect the above savings (3 times fewer clocks) to hold for
the FFT code.  The old FFT code was much better pipelined than the old
carry propagation code.

>Bear in mind that there may be a severe penalty for switching between
>SSE2 and x86 FPU formats.

There is no penalty on the P4.  The XMM registers do not overlap the
FPU registers in the same way that MMX registers do.

Having fun,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 14:06:41 -0800
From: "Stephan T. Lavavej" <[EMAIL PROTECTED]>
Subject: Mersenne: More on Compression

> This 'Wonderful' compression technology maybe "Awesome"; however, MY main
> objection ....or perhaps philosophy towards all of this is that Prime95 is
> not a large
> piece of code.  It takes a relatively small amount of time to download
over
> a modem
> compared to other software items that we modem users may download in a
weeks
> period.
> Maybe if Prime95 was ..say .. a 20MB download.  Then perhaps shaking
things
> up to save
> some download time would be a good idea, but as things stand now we are
only
> talking about saving 20 or so seconds.  Perhaps that is the reason people
> may find it 'objectionable'..
> Maybe its just not worth the hassle at the moment for this particular
> application....

Actually, the download time wouldn't change all that much.  (The package
would go from 400K to 300K.)  But the executable itself, once unzipped,
would only be some 200K, as compared to over 1MB before.  It's just more
efficient.  The question is, if compression involves a one-time, five-minute
cost on the part of the developer and saves everyone a few seconds of
download time and a few K of HD space, then why not?  Why have bloated code?
I sure like looking at 200K executables instead of megabyte and larger
things.  Makes me think of my old 486 where everything fit in 120MB.

> P.S. The website for the compression program never resolved for me; I'd
> like to take a look at it, if someone would send me the IP.

http://upx.tsx.org is a redirection URL for
http://wildsau.idv.uni-linz.ac.at/mfx/upx.html.

> The P4 at 1.5 GHz is just the beginning of a sequence of ever faster
> processors which I think will dominate the market for a long time
> to come, and, yes, with RDRAM giving much improved memory
> performance, which should give Prime95 a real kick.

RDRAM isn't good: http://www.tomshardware.com/.

> Sorry, I misunderstood your message. (Looks like I wasn't the only
> one...)

Indeed.

> Isn't the problem here that a page of code or data loaded from the
> executable has to be expanded each time it is loaded from the
> executable? This isn't a problem when there is loads of system memory
> available, but standard practise is to simply discard unmodified
> pages which are paged out, as they can be reloaded from the binary
> executable as easily as they can be reloaded after being written to
> the page/swap file. So running a compressed executable involves
> either more usage of memory or extra CPU cycles on each unmodified
> page fault.

No, the decompression cost is not paid again and again while the executable
runs: there is no (significant) memory cost or compute cost.  I don't claim
to understand how UPX works, although I intend to learn some day, but the
UPX homepage notes "Your executables suffer no memory overhead or other
drawbacks".  Also, "very fast decompression: more than 10 MB/sec even on my
old Pentium 133", "no memory overhead for your compressed executables",
"safe: you can list, test and unpack your executables. Also, a checksum of
both the compressed and uncompressed file is maintained internally."  The
two things that don't work with UPX are executables that want to read data
from themselves and things with "overlays".  (And, of course, executables
that want to _write_ to themselves), but for everything else it almost
always works and is transparent.

UPX works for Linux too - isn't there a Linux port of Prime95?  The entry
for UPX's Linux capabilities in its readme does detail some of its inner
workings.  It is lengthy.

Stephan

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 14:48:33 -0800
From: "xqrpa" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

- ----- Original Message -----
From: Stephan T. Lavavej <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 28, 2000 2:06 PM
Subject: Mersenne: More on Compression


> > The P4 at 1.5 GHz is just the beginning of a sequence of ever faster
> > processors which I think will dominate the market for a long time
> > to come, and, yes, with RDRAM giving much improved memory
> > performance, which should give Prime95 a real kick.
>
> RDRAM isn't good: http://www.tomshardware.com/.
>

Not with a PIII it isn't, point well taken, but with the P4 things are
different.

See:

http://www.gamepc.com/reviews/hardware_review.asp?review=pentium4&page=12&ms

for the SiSoft Sandra Memory benchmark for this marvel, delivering 3X
the performance of anything else tested.

I'm no Intel booster, preferring Atthlon hands down, but a 400MHz FSB
is no slouch, methinks, when it delivers numbers like the above.  Will this
benefit Prime95?  Anybody?

Best Wishes,
Stefanovic.



_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 17:16:08 -0600
From: "Jeramy Ross" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

*SNIP*
>The question is, if compression involves a one-time, five-minute
> cost on the part of the developer and saves everyone a few seconds of
> download time and a few K of HD space, then why not?  Why have bloated
code?
> I sure like looking at 200K executables instead of megabyte and larger
> things.  Makes me think of my old 486 where everything fit in 120MB.

Everything I have seen of it, and heard of it, leaves more questions than
answers.


*SNIP*

> No, the decompression cost is not paid again and again while the
executable
> runs: there is no (significant) memory cost or compute
*SNIP*
>"Your executables suffer no memory overhead or other
>drawbacks".  Also, "very fast decompression: more than 10 MB/sec even on my
>old Pentium 133", "no memory overhead for your compressed executables",


So basically this is like the magical compression fairy that touches the
code once and
we never have to worry about it again?  Not to be harsh or blunt, but these
statements
should be like huge red flags.  There is no memory overhead.. i.e. nothing
but the compressed
executable; however this magically touched executable contains the code that
decompresses it?

hmmm...

Even if everything was as stated... there are just too many questions left
unanswered for me to
feel like risking it.  Besides, I feel better with a "bloated" megabyte on
my HD than a lean mean
200KB from the compression fairy.

Sorry... I am being harsh....  but I am having a hard time seeing this in
any other way.

- - Jeramy

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 00:20:12 +0100
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: Re: More on Compression

On Tue, Nov 28, 2000 at 02:06:41PM -0800, Stephan T. Lavavej wrote:
>The question is, if compression involves a one-time, five-minute
>cost on the part of the developer and saves everyone a few seconds of
>download time and a few K of HD space, then why not?

Perhaps since UPX requires a writable /tmp?

>No, the decompression cost is not paid again and again while the executable
>runs: there is no (significant) memory cost or compute cost.  I don't claim
>to understand how UPX works, although I intend to learn some day, but the
>UPX homepage notes "Your executables suffer no memory overhead or other
>drawbacks".

UPX decompresses the program to /tmp, then runs it from there. Another
side effect is that the program (at least with my old version of UPX)
shows up as `3' instead of `mprime' in `top' and `ps', among others...

>The
>two things that don't work with UPX are executables that want to read data
>from themselves and things with "overlays".

This is for DOS only, and doesn't apply to Linux, since the Linux
version doesn't feature in-place decompression.

>UPX works for Linux too - isn't there a Linux port of Prime95?  The entry
>for UPX's Linux capabilities in its readme does detail some of its inner
>workings.  It is lengthy.

Oh, you're talking about the Windows version, not the Linux version. Oh
well, I'm not too sure it's a good idea to use in-place decompression,
when we already have enough problems with fragmented memory maps etc. --
as some seem to have noticed, there can be quite a hefty speed penalty
if the memory mapping has problems (cf. the ReCache discussion) :-)

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 19:33:22 -0500
From: Nathan Russell <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

I can see something like this being worth it for a very large executable
- - this summer, I was forced to download the Sun JDK over a 56k modem. 
It wasn't fun.  

However, the question we must consider is whether saving space in a
1-meg executable is worth the extra trouble for all concerned, as well
as using a non-standard compression format.

Nathan
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 17:05:19 -0800
From: "Stephan T. Lavavej" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

> However, the question we must consider is whether saving space in a
> 1-meg executable is worth the extra trouble for all concerned, as well
> as using a non-standard compression format.

No, no, no!

Okay, UPX packing is not an archival format like .ZIP is.  It is applied to
a .EXE file and creates a .EXE file.  The resulting .EXE file is executable,
just like the first, and behaves just like the original.  It's
_transparent_, like I said.

- -*---*-------
Stephan T. Lavavej

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 20:01:44 -0500
From: Pierre Abbat <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

On Tue, 28 Nov 2000, Nathan Russell wrote:
>However, the question we must consider is whether saving space in a
>1-meg executable is worth the extra trouble for all concerned, as well
>as using a non-standard compression format.

If you want to keep stuff compressed on your disk, use the e2compr patch. This
will compress any regular file and all programs that read or write the file
will see it just as if it were uncompressed. It does occasionally crash the
computer, though.

phma
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 23:17:12 EST
From: [EMAIL PROTECTED]
Subject: Re: Mersenne: More on Compression

- --part1_9.d8d81b2.2755dd48_boundary
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

I vote no, if anyone's counting. =A0

Stephan, if compressing the executable is that important to you, why not jus=
t=20
compress your own copy? =A0It really sounds as though many of the resident l=
ist=20
experts feel this is a bad idea.

- --part1_9.d8d81b2.2755dd48_boundary
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML><FONT FACE=3Darial,helvetica><FONT  SIZE=3D2>I vote no, if anyone's co=
unting. =A0
<BR>
<BR>Stephan, if compressing the executable is that important to you, why not=
 just <BR>compress your own copy? =A0It really sounds as though many of the=20=
resident list <BR>experts feel this is a bad idea.</FONT></HTML>

- --part1_9.d8d81b2.2755dd48_boundary--
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 21:21:03 -0800
From: "Stephan T. Lavavej" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

> I vote no, if anyone's counting.

(A), It's not a vote,

> Stephan, if compressing it is that important to you, why not just compress
> your own copy?

(B), I have, but everyone should enjoy the benefits of having a smaller
Prime95 executable, as minor as they may be.

>  It really sounds as though many of the resident list experts
> feel this is a bad idea.

(C), Some people were (and some still are) confused: packing an executable
does _not_ have to be undone by the end user; in fact, the end user need
never ever EVER know about it.  There _are_no_drawbacks_.  No compute cost,
no memory cost, only a space savings.  Which isn't bad.  Few things in life
come with no drawbacks.  Take advantage of the ones that do.

Packed executables also have an advantage on networked systems: they can be
transmitted to other computers faster, thereby creating the illusion of
increased startup speed.  Although I don't think Prime95 can be run this way
(it needs local copies, no?).

Stephan T. Lavavej

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 21:52:28 -0800
From: Greg Hewgill <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

On Tue, Nov 28, 2000 at 09:21:03PM -0800, Stephan T. Lavavej wrote:
> Packed executables also have an advantage on networked systems: they can be
> transmitted to other computers faster, thereby creating the illusion of
> increased startup speed.

Be careful; this idea may end up backfiring on you. On Windows NT (and other
sensible operating systems), executable files are not completely loaded into
memory at once on program startup. A memory "section" is set up which is an
area of virtual memory backed by the actual executable image on disk. As the
CPU executes the startup code of the executable, 4kB pages are demand-paged in
as they are touched. There is usually some prefetching and such going on too,
but that doesn't affect the point here. The point is that parts of the
executable that are not touched on startup are never actually loaded from disk
until they are used.

When running a packed executable, the first thing the unpacker stub must do is
read the whole (packed) executable file and write the unpacked version
somewhere. This unpacked version might end up in a file on disk, or it might be
in memory, which is backed by the system paging file. So, on startup you end up
touching the entire packed executable, plus you need to write the whole
unpacked version to either memory or disk.

These executable packing programs have been around for a while, I remember
PKWare had a DOS program called PKLite at least 10 years ago that did much the
same thing as UPX. In the long run, download bandwidth doesn't matter (how
often do you download new binaries, anyway?), and disk space is too cheap to
worry about (executables aren't a major consumer of disk space). You're not
really saving anything. Let the operating system do what it's best at.

Greg Hewgill
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 15:16:09 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

On 28 Nov 00, at 14:48, xqrpa wrote:

> for the SiSoft Sandra Memory benchmark for this marvel, delivering 3X the
> performance of anything else tested.
> 
> I'm no Intel booster, preferring Atthlon hands down, but a 400MHz FSB is
> no slouch, methinks, when it delivers numbers like the above.  Will this
> benefit Prime95?  Anybody?

Memory bandwidth _should_ be the dominant factor, especially when 
running LL tests on largish exponents, where the processor cache size 
is only a small fraction of the working set. However the fact remains 
that Athlon systems running with PC100 memory seem to significantly 
outperform PIII systems running with the same memory at the same 
clock speed. One can infer that the performance of the Athlon FPU is 
superior _for our particular application_, though the increase in the 
FSB (bandwidth between CPU and chipset) certainly does no harm!

I've also been disappointed by the performance of RDRAM with PIII 
systems; though my PIII 700 (i820 chipset) system did speed up about 
10% when SDRAM was exchanged for RDRAM, it's _still_ 33% slower than 
my (old type) Athlon 650 system. 

Someone told me that RDRAM works better in pairs, but the single 
modules are still so expensive that I haven't been tempted to try the 
experiment.

> two things that don't work with UPX are executables that want to   
> read data
> from themselves and things with "overlays".  (And, of course,      
> executables
> that want to _write_ to themselves), but for everything else it    
> almost
> always works and is transparent.

Unfortunately for UPX, Prime95's assembler routines write to the code 
segment. This is used to tune routines for performance between 
processor types without "bloat" caused by duplicating large sections 
of code or incurring inefficiencies caused by executing test/branch 
instructions repeatedly.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 10:24:07 EST
From: [EMAIL PROTECTED]
Subject: Re: Mersenne: More on Compression

Stephan wrote:

> (C), Some people were (and some still are) confused:
> packing an executable does _not_ have to be undone 
> by the end user; in fact, the end user need never
> ever EVER know about it.  There _are_no_drawbacks_.
> No compute cost, no memory cost, only a space 
> savings.  Which isn't bad.  Few things in life
> come with no drawbacks.  Take advantage of the ones 
> that do.

Fewer people are confused than you think.  

I would not take *anyone's* word that "there _are_no_drawbacks_" without completely 
convincing test data showing that under a variety of conditions, performance is 
*identical* with and without executable compression.

And personally, for a savings of a few hundred K, IT'S NOT WORTH IT to do that testing.

Rest assured that if the executable does end up being compressed in the general 
release, many users will either refuse to upgrade or will specifically uncompress it 
to restore the status quo.

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 11:50:55 -0600
From: Herb Savage <[EMAIL PROTECTED]>
Subject: Re: Mersenne: More on Compression

"Stephan T. Lavavej" wrote:
> (C), Some people were (and some still are) confused: packing an executable
> does _not_ have to be undone by the end user; in fact, the end user need
> never ever EVER know about it.  There _are_no_drawbacks_.  No compute cost,
> no memory cost, only a space savings.  Which isn't bad.  Few things in life
> come with no drawbacks.  Take advantage of the ones that do.

The big disadvantage of this for me is that it makes the EXE totally
opaque 
to virus scanners.  So unless the virus scanner knows how to unpack the
EXE 
(which they do for most common EXE packers) any viruses sent in this
form 
will not be detected.  The last big virus attack got by our virus
checker 
because it was an old virus encased in a new EXE pack program.

Regards,

Herb Savage
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 18:24:33 +0100
From: "Steinar H. Gunderson" <[EMAIL PROTECTED]>
Subject: Mersenne: OT: The appearance of a graph when its third derviated equals zero

Hi,

Since it's just a couple of days left of a mini-`contest' our maths
teacher has announced, and since she said it was 100% OK to get help
from other people (and because I'm lazy and don't want to search
mathematical literature everywhere :-P ), here comes a question:

When f(x) = 0, the graph crosses the X axis.
When f'(x) = 0, the graph is at its top or bottom.
When f''(x) = 0, the graph is at what we in Norwegian call a `turning
point' -- I don't know the proper English term :-)

How does a graph look like when f'''(x) = 0? I mean, I can plot a few
graphs and see that it is somewhere between f'(x) and f''(x), but I
don't see anything striking... Our teacher also mentioned a (female)
mathematician finding these points and subsequently gotten them named
after herself -- I'm sure this would be worth some bonus points ;-)

Does anybody know, and more importantly -- can you _explain_ it? (One of
the hardest things here would be that graphs don't go well in ASCII, I'd
guess ;-) )

/* Steinar */
- -- 
Homepage: http://members.xoom.com/sneeze/
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 19:56:24 +0000
From: Gareth Randall <[EMAIL PROTECTED]>
Subject: Mersenne: Accessibility of compressed archives

Dear All,

I'd like to mention two aspects of self-extracting archives which no-one has touched 
on. These are, security and accessibility.

I like "inanimate" zip (or gz/bz2 in my case) for two reasons which I consider to be 
important:

1. No executable code: I can open the most untrusted zip I care to choose and know 
that I am not going to get any viruses from the self-extraction code. It is not ideal 
to force people to actually execute untrusted code before they get to see what is 
really inside, as most self-extracting programs require.

2. No platform dependence: I don't want to have a fully set-up machine of the 
appropriate architecture and with a working copy of the target OS before I can extract 
the archive. I want to be able to peruse the latest win32 prime95 download on, say, an 
Alpha running BSD, or whatever.


Now as it happens both of these issues are of reduced significance in the case of 
distributing a program (prime95) which people are intending to execute anyway, and a 
program that is architecture dependent. However, documentation and data files 
distributed with the main executable may need to be accessed separately, and as a 
general rule self-extracting archives are more of a gimmick than a valuable tool 
(although I don't dispute that they help in some cases by reducing the amount of user 
interaction required to extract them.)

[Note that I am not saying anything about whether there is any performance improvement 
in the final execution etc. I am only commenting on the file format for downloads.]


Finally, I'm curious about the compression being better than zip. I've been wondering 
for some time about when the Burrows-Wheeler compression in bzip2 would make its way 
onto Windows. Is this a first example of it?


Yours,

======= Gareth Randall =======


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 23:20:19 +0100
From: Guillermo Ballester Valor <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4

John R Pierce wrote:

> > I know this is a little off-topic, but how good is the P4 at integer
> operations?
> 
> not that off topic at all.  Integer multiplies can be more efficient than FP
> for this sort of thing, IF they are as fast.  If a processor could pipeline
> 64 bit integer multiplies at one per clock with parallel adds at the same
> time it would be as fast or faster than using the 80 bit FP format...  It
> will be interesting seeing what IA64 brings to the table when it finally
> gets up to speed...
> 
> -jrp
> 
Today, I've read in the manuals that a simple integer add with carry
(addc) has 8 clocks of latency and 3 clocks of throughput for a P4.
Humm, too much slowdown for single Ia32 instructions, Intel engineers
will know the reasons.

Guillermo.
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 15:55:04 -0800
From: "Osher Doctorow" <[EMAIL PROTECTED]>
Subject: Mersenne: Second order LBP analysis of Fermat's Last Theorem - Doctorow

From: Osher Doctorow [EMAIL PROTECTED], Wed. Nov. 28, 2000 3:10PM

First-order LBP analysis of Fermat's Last Theorem (FLT for short in what
follows - see my contribution from the last few days) asserts that
first-order LBP maximum entropy equations/functions are the only admissible
or acceptable kinds, which means that functions must be linear (in either
independent or dependent variables), quadratic, cross-product (like kxy),
simple exponential (like a exp(bx) with b real or complex), or have
vanishing or "neglect-able" 3rd and higher partial derivatives (the latter
mostly applies to mathematical physics as in general relativity).  It turns
out that only x + y = z and x^^2 y^^2 = z^^2 (where ^^ denotes
exponentiation) are admissible under 1st-order LBP and not the higher degree
equations, which "proves" FLT under lst-order conditions.   Notice that in
general elements x, z, y can be arbitrary objects (tensors, vector,
pseudovectors, bivectors, etc.), but in FLT they're usually taken as
(positive) integers, and in LBP they are taken as continuous (e.g., all
nonnegative reals) on the nonnegative real line unless some other connected
domain is specified.

Second-order LBP analysis of FLT is instructive for its applicability to
other Mersenne/prime equations.  Here we use second-order LBP maximum
entropy, which permits multiplication, addition, or subtraction of elements
of 1st-order LBP, but not division, not inversion except subtraction, not
limits unless they enter into the previous types, and not composition of
functions. It turns out that x^^n + y^^n = z^^n needs to have two of the
quantities x, y, or z "independent", and without loss of generality I will
choose x and y to be "independent" and z = f(x, y).   However, if z is a
function of x and y, then z must be a two-variable polynomial (with terms
like kxy, kx^^3y^^5, etc.) since multiplication of x and y times themselves
only yield these (they do not yield fractional or rational powers of x
and/or y without inverse and/or limiting operations).   Therefore, z^^n must
be a two-variable polynomial unless it is a constant.  However, there is no
known equation in mathematics for which x^^n + y^^n = k^^n (constant) for n
> 2 integer (for n = 2 it is a circle, for n = 1 it is a line).   Therefore,
we have an identity x^^n + y^^n = x^^n + y^^n by setting equal powers of x
and y equal in pairs.  An identity (in this sense) is a trivial equation and
is excluded from LBP since trivial equations make no statements about
functions and hence none about their LBP entropy.  This gives a second-order
LBP proof of FLT.  Alternatively, notice that if we took partial derivatives
with respect to x, symbol Dx, of both sides of x^^n + y^^n = z^^n, we would
obtain n! = Dx...x z^^n  where Dx...x is the nth partial, and since an nth
partial is also a function of x and y, we would either end up with n! = n!
(trivial) or n! = two-variable polynomial in x and y, which by equating
terms with equal powers on the left and right sides yields n! = n!.  This
second method does not involve the fact that there is no known function of
form x^^n + y^^n = k^^n for k constant and x, y independent and n > 2
integer.  We can also exclude n! = x^^m + y^^m and similar types from it by
going back to Leibniz' rule and using induction on z = f(x,y).

Osher Doctorow

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Wed, 29 Nov 2000 20:31:28 -0500 (EST)
From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4

On Wed, 29 Nov 2000, Guillermo Ballester Valor wrote:

> Today, I've read in the manuals that a simple integer add with carry
> (addc) has 8 clocks of latency and 3 clocks of throughput for a P4.
> Humm, too much slowdown for single Ia32 instructions, Intel engineers
> will know the reasons.

This and the 14-18 clock multiply are extremely depressing. For all the
marketing hype about Intel catering to the internet and the next
generation of "active media", they're making it awfully difficult to
implement the cryptography that the internet needs.

The alpha was already at least 5x faster than a PIII for multiprecision
arithmetic at the same clock speed; with the P4 it will only get worse.

jasonp

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 30 Nov 2000 09:01:21 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4

On 29 Nov 00, at 20:31, Jason Stratos Papadopoulos wrote:

> > Humm, too much slowdown for single Ia32 instructions, Intel engineers
> > will know the reasons.

Presumably something to do with not being parallel architecture; 
there's only one set of flags, so you can't start an ADC until the 
previous instruction has completed and the flags register has been 
retired. In IA32, there's nothing to tell you _which_ carry flag to 
use in an instruction which uses carry as an input, or output.
> 
> This and the 14-18 clock multiply are extremely depressing. For all the
> marketing hype about Intel catering to the internet and the next
> generation of "active media", they're making it awfully difficult to
> implement the cryptography that the internet needs.

Well, you're not _forced_ to buy Intel - even if you stick to 
Windoze! I get the impression that, with the Athlon architecture, AMD 
are making serious inroads into the market - especially at the lower 
end. Now it just so happens that the Athlon architecture runs Prime95
rather well.
> 
> The alpha was already at least 5x faster than a PIII for multiprecision
> arithmetic at the same clock speed; with the P4 it will only get worse.

Are you sure about this? I think, with Alpha, you have to execute the 
instruction twice to get a double-precision multiplication - you can 
store either the low half or the high half of a product in one 
instruction, but not both.

Of course, the Alpha's single precision integer arithmetic is 64 
bits, not 32. This does help somewhat :)

It's also easy to get confused between the latency (time between 
loading the opcode and finishing execution) and the throughput (the 
inverse of the longest time an instruction spends in any particular 
unit, times the number of the critical unit involved in the design).

The fact remains that the P4, like all other commercial processors, 
is a compromise. This is what makes it so darned complicated, and 
also indicates why the designers have to make performance tradeoffs, 
some of which don't suit our particular application. It would be 
possible to optimize the design so that it executed the instructions 
_we_ find useful much more quickly, but whether such a processor 
would be capable of running standard commercial benchmarking programs 
at a reasonable speed - or indeed at all - is open to question ... 
we'd probably want to reduce the inherent complexity by junking the 
16-bit x86 legacy, for a start...


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 30 Nov 2000 20:45:59 -0500
From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]>
Subject: Mersenne: Factoring

This is a multi-part message in MIME format.

- ------=_NextPart_000_000F_01C05B0E.8B7A6D20
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

We could let prime95 decide the next election <grin>.

Give everybody a different prime number. Multiply the primes for =
candidate A together, likewise for B.

If the factorization of the composite is not square free then we know =
that=20

1) Someone voted twice for the same candidate, and=20

2) We know who that is.=20

If the 2 composites contain the same factor, then we know that=20

1) Someone voted for both candidates and=20

2) Again we know who.=20

No lawsuits, no recounts, no chads.=20

Frank


- ------=_NextPart_000_000F_01C05B0E.8B7A6D20
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 5.50.4522.1800" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>
<P>We could let prime95 decide the next election &lt;grin&gt;.</P>
<P></FONT><FONT face=3DArial size=3D2>Give everybody a different prime=20
number.</FONT><FONT size=3D2> </FONT><FONT face=3DArial =
size=3D2>Multiply the primes=20
for candidate A together, likewise for B.</FONT></P>
<P><FONT face=3DArial size=3D2>If the factorization of the composite is =
not square=20
free then we know that</FONT><FONT size=3D2> </P></FONT><FONT =
face=3DArial size=3D2>
<P>1) Someone voted twice for the same candidate, and</FONT><FONT =
size=3D2>=20
</P></FONT><FONT face=3DArial size=3D2>
<P>2) We know who that is.</FONT><FONT size=3D2> </P></FONT><FONT =
face=3DArial=20
size=3D2>
<P>If the 2 composites contain the same factor, then we know =
that</FONT><FONT=20
size=3D2> </P></FONT><FONT face=3DArial size=3D2>
<P>1) Someone voted for both candidates and</FONT><FONT size=3D2> =
</P></FONT><FONT=20
face=3DArial size=3D2>
<P>2) Again we know who.</FONT><FONT size=3D2> </P></FONT><FONT =
face=3DArial size=3D2>
<P>No lawsuits, no recounts, no chads.</FONT><FONT size=3D2> =
</P></FONT><FONT=20
face=3DArial size=3D2>
<P>Frank</P></FONT></DIV></BODY></HTML>

- ------=_NextPart_000_000F_01C05B0E.8B7A6D20--


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #796
******************************

Reply via email to