Mersenne Digest Thursday, November 30 2000 Volume 01 : Number 796 ---------------------------------------------------------------------- Date: Tue, 28 Nov 2000 15:48:11 -0500 From: "Willmore, David (VS Central)" <[EMAIL PROTECTED]> Subject: RE: Mersenne: P4 - a correction George wrote: > One correction to my previous post. I said that the latency to > access the L1 data cache was 2 clocks. This is correct for integer > instructions only. For floating point and SSE2 instructions the latency > is 6 clocks! Interestingly, the L2 cache latency is 7 clocks for both > integer and floating point instructions. > Look at the coupling that the FPU has to the cache for one reason. I would expect that the FPU(s) have more ports on the L1 than that integer units do. Also, if you look at the sensitivity of different types of code to load latency, integer code, by far, is more sensitive than floating point. Think about the length of the floating point pipeline, it's pretty long to start with, so you're gonig to *have* to unroll your code to take advantage of the pipeline, so you might as well cover the additional load to use latency the same way. With enough rename registers, it's all good. :) Cheers, David _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 16:48:13 -0500 From: George Woltman <[EMAIL PROTECTED]> Subject: Mersenne: Re: P4 - a correction Hi, At 07:45 AM 11/28/00 +0000, Brian J. Beesley wrote: >Surely the latency is much less relevant than the throughput, >provided that there are sufficient registers and pipeline entries. True. The manuals are unclear as to how many XMM registers are available. There are 8 user visible ones and an unknown number of internal ones that you use via the P4's register renaming feature. >The idea is to keep the execution units busy ... I just finshed designing the new code for one of the easier problems - the rounding and carry propagation step after the inverse FFT. It looks like the P4 can do this in 6.5 clocks per double vs. about 19 clocks per double on the P3. This doesn't count any benefits from prefetching data. It also seems that the multiply units are idle enough in the new code so that I can continue to use the compact two-to-fractional-power arrays, rather than two full 5MB arrays. Thus, yesterday's question of 33MB max is probably 23MB max for a 640K FFT. That's good, because I'd feel guilty using up 33MB. Heck, I feel guilty using 23MB. I don't expect the above savings (3 times fewer clocks) to hold for the FFT code. The old FFT code was much better pipelined than the old carry propagation code. >Bear in mind that there may be a severe penalty for switching between >SSE2 and x86 FPU formats. There is no penalty on the P4. The XMM registers do not overlap the FPU registers in the same way that MMX registers do. Having fun, George _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 14:06:41 -0800 From: "Stephan T. Lavavej" <[EMAIL PROTECTED]> Subject: Mersenne: More on Compression > This 'Wonderful' compression technology maybe "Awesome"; however, MY main > objection ....or perhaps philosophy towards all of this is that Prime95 is > not a large > piece of code. It takes a relatively small amount of time to download over > a modem > compared to other software items that we modem users may download in a weeks > period. > Maybe if Prime95 was ..say .. a 20MB download. Then perhaps shaking things > up to save > some download time would be a good idea, but as things stand now we are only > talking about saving 20 or so seconds. Perhaps that is the reason people > may find it 'objectionable'.. > Maybe its just not worth the hassle at the moment for this particular > application.... Actually, the download time wouldn't change all that much. (The package would go from 400K to 300K.) But the executable itself, once unzipped, would only be some 200K, as compared to over 1MB before. It's just more efficient. The question is, if compression involves a one-time, five-minute cost on the part of the developer and saves everyone a few seconds of download time and a few K of HD space, then why not? Why have bloated code? I sure like looking at 200K executables instead of megabyte and larger things. Makes me think of my old 486 where everything fit in 120MB. > P.S. The website for the compression program never resolved for me; I'd > like to take a look at it, if someone would send me the IP. http://upx.tsx.org is a redirection URL for http://wildsau.idv.uni-linz.ac.at/mfx/upx.html. > The P4 at 1.5 GHz is just the beginning of a sequence of ever faster > processors which I think will dominate the market for a long time > to come, and, yes, with RDRAM giving much improved memory > performance, which should give Prime95 a real kick. RDRAM isn't good: http://www.tomshardware.com/. > Sorry, I misunderstood your message. (Looks like I wasn't the only > one...) Indeed. > Isn't the problem here that a page of code or data loaded from the > executable has to be expanded each time it is loaded from the > executable? This isn't a problem when there is loads of system memory > available, but standard practise is to simply discard unmodified > pages which are paged out, as they can be reloaded from the binary > executable as easily as they can be reloaded after being written to > the page/swap file. So running a compressed executable involves > either more usage of memory or extra CPU cycles on each unmodified > page fault. No, the decompression cost is not paid again and again while the executable runs: there is no (significant) memory cost or compute cost. I don't claim to understand how UPX works, although I intend to learn some day, but the UPX homepage notes "Your executables suffer no memory overhead or other drawbacks". Also, "very fast decompression: more than 10 MB/sec even on my old Pentium 133", "no memory overhead for your compressed executables", "safe: you can list, test and unpack your executables. Also, a checksum of both the compressed and uncompressed file is maintained internally." The two things that don't work with UPX are executables that want to read data from themselves and things with "overlays". (And, of course, executables that want to _write_ to themselves), but for everything else it almost always works and is transparent. UPX works for Linux too - isn't there a Linux port of Prime95? The entry for UPX's Linux capabilities in its readme does detail some of its inner workings. It is lengthy. Stephan _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 14:48:33 -0800 From: "xqrpa" <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression - ----- Original Message ----- From: Stephan T. Lavavej <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, November 28, 2000 2:06 PM Subject: Mersenne: More on Compression > > The P4 at 1.5 GHz is just the beginning of a sequence of ever faster > > processors which I think will dominate the market for a long time > > to come, and, yes, with RDRAM giving much improved memory > > performance, which should give Prime95 a real kick. > > RDRAM isn't good: http://www.tomshardware.com/. > Not with a PIII it isn't, point well taken, but with the P4 things are different. See: http://www.gamepc.com/reviews/hardware_review.asp?review=pentium4&page=12&ms for the SiSoft Sandra Memory benchmark for this marvel, delivering 3X the performance of anything else tested. I'm no Intel booster, preferring Atthlon hands down, but a 400MHz FSB is no slouch, methinks, when it delivers numbers like the above. Will this benefit Prime95? Anybody? Best Wishes, Stefanovic. _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 17:16:08 -0600 From: "Jeramy Ross" <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression *SNIP* >The question is, if compression involves a one-time, five-minute > cost on the part of the developer and saves everyone a few seconds of > download time and a few K of HD space, then why not? Why have bloated code? > I sure like looking at 200K executables instead of megabyte and larger > things. Makes me think of my old 486 where everything fit in 120MB. Everything I have seen of it, and heard of it, leaves more questions than answers. *SNIP* > No, the decompression cost is not paid again and again while the executable > runs: there is no (significant) memory cost or compute *SNIP* >"Your executables suffer no memory overhead or other >drawbacks". Also, "very fast decompression: more than 10 MB/sec even on my >old Pentium 133", "no memory overhead for your compressed executables", So basically this is like the magical compression fairy that touches the code once and we never have to worry about it again? Not to be harsh or blunt, but these statements should be like huge red flags. There is no memory overhead.. i.e. nothing but the compressed executable; however this magically touched executable contains the code that decompresses it? hmmm... Even if everything was as stated... there are just too many questions left unanswered for me to feel like risking it. Besides, I feel better with a "bloated" megabyte on my HD than a lean mean 200KB from the compression fairy. Sorry... I am being harsh.... but I am having a hard time seeing this in any other way. - - Jeramy _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 00:20:12 +0100 From: "Steinar H. Gunderson" <[EMAIL PROTECTED]> Subject: Mersenne: Re: More on Compression On Tue, Nov 28, 2000 at 02:06:41PM -0800, Stephan T. Lavavej wrote: >The question is, if compression involves a one-time, five-minute >cost on the part of the developer and saves everyone a few seconds of >download time and a few K of HD space, then why not? Perhaps since UPX requires a writable /tmp? >No, the decompression cost is not paid again and again while the executable >runs: there is no (significant) memory cost or compute cost. I don't claim >to understand how UPX works, although I intend to learn some day, but the >UPX homepage notes "Your executables suffer no memory overhead or other >drawbacks". UPX decompresses the program to /tmp, then runs it from there. Another side effect is that the program (at least with my old version of UPX) shows up as `3' instead of `mprime' in `top' and `ps', among others... >The >two things that don't work with UPX are executables that want to read data >from themselves and things with "overlays". This is for DOS only, and doesn't apply to Linux, since the Linux version doesn't feature in-place decompression. >UPX works for Linux too - isn't there a Linux port of Prime95? The entry >for UPX's Linux capabilities in its readme does detail some of its inner >workings. It is lengthy. Oh, you're talking about the Windows version, not the Linux version. Oh well, I'm not too sure it's a good idea to use in-place decompression, when we already have enough problems with fragmented memory maps etc. -- as some seem to have noticed, there can be quite a hefty speed penalty if the memory mapping has problems (cf. the ReCache discussion) :-) /* Steinar */ - -- Homepage: http://members.xoom.com/sneeze/ _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 19:33:22 -0500 From: Nathan Russell <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression I can see something like this being worth it for a very large executable - - this summer, I was forced to download the Sun JDK over a 56k modem. It wasn't fun. However, the question we must consider is whether saving space in a 1-meg executable is worth the extra trouble for all concerned, as well as using a non-standard compression format. Nathan _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 17:05:19 -0800 From: "Stephan T. Lavavej" <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression > However, the question we must consider is whether saving space in a > 1-meg executable is worth the extra trouble for all concerned, as well > as using a non-standard compression format. No, no, no! Okay, UPX packing is not an archival format like .ZIP is. It is applied to a .EXE file and creates a .EXE file. The resulting .EXE file is executable, just like the first, and behaves just like the original. It's _transparent_, like I said. - -*---*------- Stephan T. Lavavej _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 20:01:44 -0500 From: Pierre Abbat <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression On Tue, 28 Nov 2000, Nathan Russell wrote: >However, the question we must consider is whether saving space in a >1-meg executable is worth the extra trouble for all concerned, as well >as using a non-standard compression format. If you want to keep stuff compressed on your disk, use the e2compr patch. This will compress any regular file and all programs that read or write the file will see it just as if it were uncompressed. It does occasionally crash the computer, though. phma _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 23:17:12 EST From: [EMAIL PROTECTED] Subject: Re: Mersenne: More on Compression - --part1_9.d8d81b2.2755dd48_boundary Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable I vote no, if anyone's counting. =A0 Stephan, if compressing the executable is that important to you, why not jus= t=20 compress your own copy? =A0It really sounds as though many of the resident l= ist=20 experts feel this is a bad idea. - --part1_9.d8d81b2.2755dd48_boundary Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable <HTML><FONT FACE=3Darial,helvetica><FONT SIZE=3D2>I vote no, if anyone's co= unting. =A0 <BR> <BR>Stephan, if compressing the executable is that important to you, why not= just <BR>compress your own copy? =A0It really sounds as though many of the=20= resident list <BR>experts feel this is a bad idea.</FONT></HTML> - --part1_9.d8d81b2.2755dd48_boundary-- _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 21:21:03 -0800 From: "Stephan T. Lavavej" <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression > I vote no, if anyone's counting. (A), It's not a vote, > Stephan, if compressing it is that important to you, why not just compress > your own copy? (B), I have, but everyone should enjoy the benefits of having a smaller Prime95 executable, as minor as they may be. > It really sounds as though many of the resident list experts > feel this is a bad idea. (C), Some people were (and some still are) confused: packing an executable does _not_ have to be undone by the end user; in fact, the end user need never ever EVER know about it. There _are_no_drawbacks_. No compute cost, no memory cost, only a space savings. Which isn't bad. Few things in life come with no drawbacks. Take advantage of the ones that do. Packed executables also have an advantage on networked systems: they can be transmitted to other computers faster, thereby creating the illusion of increased startup speed. Although I don't think Prime95 can be run this way (it needs local copies, no?). Stephan T. Lavavej _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 28 Nov 2000 21:52:28 -0800 From: Greg Hewgill <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression On Tue, Nov 28, 2000 at 09:21:03PM -0800, Stephan T. Lavavej wrote: > Packed executables also have an advantage on networked systems: they can be > transmitted to other computers faster, thereby creating the illusion of > increased startup speed. Be careful; this idea may end up backfiring on you. On Windows NT (and other sensible operating systems), executable files are not completely loaded into memory at once on program startup. A memory "section" is set up which is an area of virtual memory backed by the actual executable image on disk. As the CPU executes the startup code of the executable, 4kB pages are demand-paged in as they are touched. There is usually some prefetching and such going on too, but that doesn't affect the point here. The point is that parts of the executable that are not touched on startup are never actually loaded from disk until they are used. When running a packed executable, the first thing the unpacker stub must do is read the whole (packed) executable file and write the unpacked version somewhere. This unpacked version might end up in a file on disk, or it might be in memory, which is backed by the system paging file. So, on startup you end up touching the entire packed executable, plus you need to write the whole unpacked version to either memory or disk. These executable packing programs have been around for a while, I remember PKWare had a DOS program called PKLite at least 10 years ago that did much the same thing as UPX. In the long run, download bandwidth doesn't matter (how often do you download new binaries, anyway?), and disk space is too cheap to worry about (executables aren't a major consumer of disk space). You're not really saving anything. Let the operating system do what it's best at. Greg Hewgill _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 15:16:09 -0000 From: "Brian J. Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression On 28 Nov 00, at 14:48, xqrpa wrote: > for the SiSoft Sandra Memory benchmark for this marvel, delivering 3X the > performance of anything else tested. > > I'm no Intel booster, preferring Atthlon hands down, but a 400MHz FSB is > no slouch, methinks, when it delivers numbers like the above. Will this > benefit Prime95? Anybody? Memory bandwidth _should_ be the dominant factor, especially when running LL tests on largish exponents, where the processor cache size is only a small fraction of the working set. However the fact remains that Athlon systems running with PC100 memory seem to significantly outperform PIII systems running with the same memory at the same clock speed. One can infer that the performance of the Athlon FPU is superior _for our particular application_, though the increase in the FSB (bandwidth between CPU and chipset) certainly does no harm! I've also been disappointed by the performance of RDRAM with PIII systems; though my PIII 700 (i820 chipset) system did speed up about 10% when SDRAM was exchanged for RDRAM, it's _still_ 33% slower than my (old type) Athlon 650 system. Someone told me that RDRAM works better in pairs, but the single modules are still so expensive that I haven't been tempted to try the experiment. > two things that don't work with UPX are executables that want to > read data > from themselves and things with "overlays". (And, of course, > executables > that want to _write_ to themselves), but for everything else it > almost > always works and is transparent. Unfortunately for UPX, Prime95's assembler routines write to the code segment. This is used to tune routines for performance between processor types without "bloat" caused by duplicating large sections of code or incurring inefficiencies caused by executing test/branch instructions repeatedly. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 10:24:07 EST From: [EMAIL PROTECTED] Subject: Re: Mersenne: More on Compression Stephan wrote: > (C), Some people were (and some still are) confused: > packing an executable does _not_ have to be undone > by the end user; in fact, the end user need never > ever EVER know about it. There _are_no_drawbacks_. > No compute cost, no memory cost, only a space > savings. Which isn't bad. Few things in life > come with no drawbacks. Take advantage of the ones > that do. Fewer people are confused than you think. I would not take *anyone's* word that "there _are_no_drawbacks_" without completely convincing test data showing that under a variety of conditions, performance is *identical* with and without executable compression. And personally, for a savings of a few hundred K, IT'S NOT WORTH IT to do that testing. Rest assured that if the executable does end up being compressed in the general release, many users will either refuse to upgrade or will specifically uncompress it to restore the status quo. _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 11:50:55 -0600 From: Herb Savage <[EMAIL PROTECTED]> Subject: Re: Mersenne: More on Compression "Stephan T. Lavavej" wrote: > (C), Some people were (and some still are) confused: packing an executable > does _not_ have to be undone by the end user; in fact, the end user need > never ever EVER know about it. There _are_no_drawbacks_. No compute cost, > no memory cost, only a space savings. Which isn't bad. Few things in life > come with no drawbacks. Take advantage of the ones that do. The big disadvantage of this for me is that it makes the EXE totally opaque to virus scanners. So unless the virus scanner knows how to unpack the EXE (which they do for most common EXE packers) any viruses sent in this form will not be detected. The last big virus attack got by our virus checker because it was an old virus encased in a new EXE pack program. Regards, Herb Savage _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 18:24:33 +0100 From: "Steinar H. Gunderson" <[EMAIL PROTECTED]> Subject: Mersenne: OT: The appearance of a graph when its third derviated equals zero Hi, Since it's just a couple of days left of a mini-`contest' our maths teacher has announced, and since she said it was 100% OK to get help from other people (and because I'm lazy and don't want to search mathematical literature everywhere :-P ), here comes a question: When f(x) = 0, the graph crosses the X axis. When f'(x) = 0, the graph is at its top or bottom. When f''(x) = 0, the graph is at what we in Norwegian call a `turning point' -- I don't know the proper English term :-) How does a graph look like when f'''(x) = 0? I mean, I can plot a few graphs and see that it is somewhere between f'(x) and f''(x), but I don't see anything striking... Our teacher also mentioned a (female) mathematician finding these points and subsequently gotten them named after herself -- I'm sure this would be worth some bonus points ;-) Does anybody know, and more importantly -- can you _explain_ it? (One of the hardest things here would be that graphs don't go well in ASCII, I'd guess ;-) ) /* Steinar */ - -- Homepage: http://members.xoom.com/sneeze/ _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 19:56:24 +0000 From: Gareth Randall <[EMAIL PROTECTED]> Subject: Mersenne: Accessibility of compressed archives Dear All, I'd like to mention two aspects of self-extracting archives which no-one has touched on. These are, security and accessibility. I like "inanimate" zip (or gz/bz2 in my case) for two reasons which I consider to be important: 1. No executable code: I can open the most untrusted zip I care to choose and know that I am not going to get any viruses from the self-extraction code. It is not ideal to force people to actually execute untrusted code before they get to see what is really inside, as most self-extracting programs require. 2. No platform dependence: I don't want to have a fully set-up machine of the appropriate architecture and with a working copy of the target OS before I can extract the archive. I want to be able to peruse the latest win32 prime95 download on, say, an Alpha running BSD, or whatever. Now as it happens both of these issues are of reduced significance in the case of distributing a program (prime95) which people are intending to execute anyway, and a program that is architecture dependent. However, documentation and data files distributed with the main executable may need to be accessed separately, and as a general rule self-extracting archives are more of a gimmick than a valuable tool (although I don't dispute that they help in some cases by reducing the amount of user interaction required to extract them.) [Note that I am not saying anything about whether there is any performance improvement in the final execution etc. I am only commenting on the file format for downloads.] Finally, I'm curious about the compression being better than zip. I've been wondering for some time about when the Burrows-Wheeler compression in bzip2 would make its way onto Windows. Is this a first example of it? Yours, ======= Gareth Randall ======= _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 23:20:19 +0100 From: Guillermo Ballester Valor <[EMAIL PROTECTED]> Subject: Re: Mersenne: P4 John R Pierce wrote: > > I know this is a little off-topic, but how good is the P4 at integer > operations? > > not that off topic at all. Integer multiplies can be more efficient than FP > for this sort of thing, IF they are as fast. If a processor could pipeline > 64 bit integer multiplies at one per clock with parallel adds at the same > time it would be as fast or faster than using the 80 bit FP format... It > will be interesting seeing what IA64 brings to the table when it finally > gets up to speed... > > -jrp > Today, I've read in the manuals that a simple integer add with carry (addc) has 8 clocks of latency and 3 clocks of throughput for a P4. Humm, too much slowdown for single Ia32 instructions, Intel engineers will know the reasons. Guillermo. _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 15:55:04 -0800 From: "Osher Doctorow" <[EMAIL PROTECTED]> Subject: Mersenne: Second order LBP analysis of Fermat's Last Theorem - Doctorow From: Osher Doctorow [EMAIL PROTECTED], Wed. Nov. 28, 2000 3:10PM First-order LBP analysis of Fermat's Last Theorem (FLT for short in what follows - see my contribution from the last few days) asserts that first-order LBP maximum entropy equations/functions are the only admissible or acceptable kinds, which means that functions must be linear (in either independent or dependent variables), quadratic, cross-product (like kxy), simple exponential (like a exp(bx) with b real or complex), or have vanishing or "neglect-able" 3rd and higher partial derivatives (the latter mostly applies to mathematical physics as in general relativity). It turns out that only x + y = z and x^^2 y^^2 = z^^2 (where ^^ denotes exponentiation) are admissible under 1st-order LBP and not the higher degree equations, which "proves" FLT under lst-order conditions. Notice that in general elements x, z, y can be arbitrary objects (tensors, vector, pseudovectors, bivectors, etc.), but in FLT they're usually taken as (positive) integers, and in LBP they are taken as continuous (e.g., all nonnegative reals) on the nonnegative real line unless some other connected domain is specified. Second-order LBP analysis of FLT is instructive for its applicability to other Mersenne/prime equations. Here we use second-order LBP maximum entropy, which permits multiplication, addition, or subtraction of elements of 1st-order LBP, but not division, not inversion except subtraction, not limits unless they enter into the previous types, and not composition of functions. It turns out that x^^n + y^^n = z^^n needs to have two of the quantities x, y, or z "independent", and without loss of generality I will choose x and y to be "independent" and z = f(x, y). However, if z is a function of x and y, then z must be a two-variable polynomial (with terms like kxy, kx^^3y^^5, etc.) since multiplication of x and y times themselves only yield these (they do not yield fractional or rational powers of x and/or y without inverse and/or limiting operations). Therefore, z^^n must be a two-variable polynomial unless it is a constant. However, there is no known equation in mathematics for which x^^n + y^^n = k^^n (constant) for n > 2 integer (for n = 2 it is a circle, for n = 1 it is a line). Therefore, we have an identity x^^n + y^^n = x^^n + y^^n by setting equal powers of x and y equal in pairs. An identity (in this sense) is a trivial equation and is excluded from LBP since trivial equations make no statements about functions and hence none about their LBP entropy. This gives a second-order LBP proof of FLT. Alternatively, notice that if we took partial derivatives with respect to x, symbol Dx, of both sides of x^^n + y^^n = z^^n, we would obtain n! = Dx...x z^^n where Dx...x is the nth partial, and since an nth partial is also a function of x and y, we would either end up with n! = n! (trivial) or n! = two-variable polynomial in x and y, which by equating terms with equal powers on the left and right sides yields n! = n!. This second method does not involve the fact that there is no known function of form x^^n + y^^n = k^^n for k constant and x, y independent and n > 2 integer. We can also exclude n! = x^^m + y^^m and similar types from it by going back to Leibniz' rule and using induction on z = f(x,y). Osher Doctorow _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 29 Nov 2000 20:31:28 -0500 (EST) From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]> Subject: Re: Mersenne: P4 On Wed, 29 Nov 2000, Guillermo Ballester Valor wrote: > Today, I've read in the manuals that a simple integer add with carry > (addc) has 8 clocks of latency and 3 clocks of throughput for a P4. > Humm, too much slowdown for single Ia32 instructions, Intel engineers > will know the reasons. This and the 14-18 clock multiply are extremely depressing. For all the marketing hype about Intel catering to the internet and the next generation of "active media", they're making it awfully difficult to implement the cryptography that the internet needs. The alpha was already at least 5x faster than a PIII for multiprecision arithmetic at the same clock speed; with the P4 it will only get worse. jasonp _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Thu, 30 Nov 2000 09:01:21 -0000 From: "Brian J. Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: P4 On 29 Nov 00, at 20:31, Jason Stratos Papadopoulos wrote: > > Humm, too much slowdown for single Ia32 instructions, Intel engineers > > will know the reasons. Presumably something to do with not being parallel architecture; there's only one set of flags, so you can't start an ADC until the previous instruction has completed and the flags register has been retired. In IA32, there's nothing to tell you _which_ carry flag to use in an instruction which uses carry as an input, or output. > > This and the 14-18 clock multiply are extremely depressing. For all the > marketing hype about Intel catering to the internet and the next > generation of "active media", they're making it awfully difficult to > implement the cryptography that the internet needs. Well, you're not _forced_ to buy Intel - even if you stick to Windoze! I get the impression that, with the Athlon architecture, AMD are making serious inroads into the market - especially at the lower end. Now it just so happens that the Athlon architecture runs Prime95 rather well. > > The alpha was already at least 5x faster than a PIII for multiprecision > arithmetic at the same clock speed; with the P4 it will only get worse. Are you sure about this? I think, with Alpha, you have to execute the instruction twice to get a double-precision multiplication - you can store either the low half or the high half of a product in one instruction, but not both. Of course, the Alpha's single precision integer arithmetic is 64 bits, not 32. This does help somewhat :) It's also easy to get confused between the latency (time between loading the opcode and finishing execution) and the throughput (the inverse of the longest time an instruction spends in any particular unit, times the number of the critical unit involved in the design). The fact remains that the P4, like all other commercial processors, is a compromise. This is what makes it so darned complicated, and also indicates why the designers have to make performance tradeoffs, some of which don't suit our particular application. It would be possible to optimize the design so that it executed the instructions _we_ find useful much more quickly, but whether such a processor would be capable of running standard commercial benchmarking programs at a reasonable speed - or indeed at all - is open to question ... we'd probably want to reduce the inherent complexity by junking the 16-bit x86 legacy, for a start... Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Thu, 30 Nov 2000 20:45:59 -0500 From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]> Subject: Mersenne: Factoring This is a multi-part message in MIME format. - ------=_NextPart_000_000F_01C05B0E.8B7A6D20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable We could let prime95 decide the next election <grin>. Give everybody a different prime number. Multiply the primes for = candidate A together, likewise for B. If the factorization of the composite is not square free then we know = that=20 1) Someone voted twice for the same candidate, and=20 2) We know who that is.=20 If the 2 composites contain the same factor, then we know that=20 1) Someone voted for both candidates and=20 2) Again we know who.=20 No lawsuits, no recounts, no chads.=20 Frank - ------=_NextPart_000_000F_01C05B0E.8B7A6D20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=3DContent-Type content=3D"text/html; = charset=3Diso-8859-1"> <META content=3D"MSHTML 5.50.4522.1800" name=3DGENERATOR> <STYLE></STYLE> </HEAD> <BODY bgColor=3D#ffffff> <DIV><FONT face=3DArial size=3D2> <P>We could let prime95 decide the next election <grin>.</P> <P></FONT><FONT face=3DArial size=3D2>Give everybody a different prime=20 number.</FONT><FONT size=3D2> </FONT><FONT face=3DArial = size=3D2>Multiply the primes=20 for candidate A together, likewise for B.</FONT></P> <P><FONT face=3DArial size=3D2>If the factorization of the composite is = not square=20 free then we know that</FONT><FONT size=3D2> </P></FONT><FONT = face=3DArial size=3D2> <P>1) Someone voted twice for the same candidate, and</FONT><FONT = size=3D2>=20 </P></FONT><FONT face=3DArial size=3D2> <P>2) We know who that is.</FONT><FONT size=3D2> </P></FONT><FONT = face=3DArial=20 size=3D2> <P>If the 2 composites contain the same factor, then we know = that</FONT><FONT=20 size=3D2> </P></FONT><FONT face=3DArial size=3D2> <P>1) Someone voted for both candidates and</FONT><FONT size=3D2> = </P></FONT><FONT=20 face=3DArial size=3D2> <P>2) Again we know who.</FONT><FONT size=3D2> </P></FONT><FONT = face=3DArial size=3D2> <P>No lawsuits, no recounts, no chads.</FONT><FONT size=3D2> = </P></FONT><FONT=20 face=3DArial size=3D2> <P>Frank</P></FONT></DIV></BODY></HTML> - ------=_NextPart_000_000F_01C05B0E.8B7A6D20-- _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #796 ******************************
