Mersenne Digest Thursday, June 19 2003 Volume 01 : Number 1076
---------------------------------------------------------------------- Date: Mon, 16 Jun 2003 17:08:57 -0400 From: Bill Daly <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? Occam's Razor suggests that the original report was faked. Until this possibility can be absolutely eliminated, it seems to me a waste of time to try to construct elaborate and complicated scenarios which might lead to a false positive. - -- ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by Anti-virus detection software for the presence of computer viruses. ********************************************************************** _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 16 Jun 2003 19:58:06 -0400 From: George Woltman <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? At 05:08 PM 6/16/2003 -0400, Bill Daly wrote: >Occam's Razor suggests that the original report was faked. Until this >possibility can be absolutely eliminated, It can't be absolutely eliminated. However, there are many indications that this was not a hacking attempt. The user has been at this 5 years. His emails we're genuine. He made no attempt at gaining publicity. He's more my age (old) rather than the typical hacker age (young). If this was a hacking attempt, why create a results.txt file showing 5 errors during the run? In the end, it doesn't really matter. Even if this we're a hacking attempt, tightening up the code to reduce the chance of a false positive is a good thing. _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 17 Jun 2003 00:44:23 +0000 From: "Elias Daher" <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? >I'm also adding code to 23.5 to check EVERY iteration for an impossible >result >such as -2, -1, 0, 1, 2. This test will be very, very quick. But no matter how quick this additional test will be, it will slow down all machines globally, while a false positive report would only "slow down" two machines to double check for a limited time!!! If we consider a double check will take 15 days on an average machine, and 15 days (in the same time) on another machine, this makes 30 days x 24 x 3,600 = 2,592,000 seconds. While if we consider that this additional test takes only 3 minutes per exponent overall in average, there are over 40,000 machines working in the project, I don't know the exact number but if we consider half that number work on LL tests, 20,000 x 180 = 3,600,000 seconds. (There's a lot more machines assigned LL tests!) So this makes a difference of 1,008,000 seconds and this of course just in the case a false positive was reported!!! So I think the additional test is not needed, it would only slow down the project... _________________________________________________________________ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 16 Jun 2003 19:18:05 -0700 From: Luke Welsh <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? At 12:44 AM 6/17/03 +0000, Elias Daher wrote: >(quoting George) >>I'm also adding code to 23.5 to check EVERY iteration for an impossible >>result >>such as -2, -1, 0, 1, 2. This test will be very, very quick. >But no matter how quick this additional test will be, it will slow down all >machines globally [...] >While if we consider that this additional test takes only 3 minutes per >exponent overall in average It will probably take less than 0.3 seconds per exponent overall because it will very seldom require checking any value other than the low order member. After each iteration check: if ( x(0) >= -2.0 AND x(0) <= 2.0 ) then // this code is rarely encountered and can be inefficient [check the rest of x(1:n-1)] end if - --Luke _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 16 Jun 2003 19:25:51 -0700 From: Luke Welsh <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? At 04:16 PM 6/16/03 -0400, George Woltman wrote: >At 07:52 PM 6/15/2003 -0700, Luke Welsh wrote: >>I'm fishing.... I'm still fishing. Is it possible to iterate improperly? If so, could you detect that the code is executing too quickly? >FYI, six times a result of 0000000000000002 has been reported to the >server. So, somehow or another it is possible for a hardware error to >zero the FFT data without triggering an ILLEGAL SUMOUT. A new Pentium bug? (Athlon bug?) Any commonality in CPU types or chip sets? - --Luke _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 16 Jun 2003 22:25:15 -0400 From: George Woltman <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? At 12:44 AM 6/17/2003 +0000, Elias Daher wrote: >no matter how quick this additional test will be, it will slow down all >machines globally, >if we consider that this additional test takes only 3 minutes per exponent >overall in average, ... The test will take at most 300 clock cycles (the time to read a double from main memory). 20,000,000 iterations * 300 clocks / 1 GHz cpu speed will add only 6 seconds to the average LL test. This is 30 times better than your 3 minutes estimate. I think adding 6 seconds or even 3 minutes per LL test will have negligible impact on GIMPS throughput. Your analysis is correct, but maybe there are other small, unquantifiable factors at work. How about the possible cost of a second false alarm causing 3 people to drop out of GIMPS saying "these guys are bozos that don't learn from their mistakes"? _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 17 Jun 2003 06:45:09 +0000 From: "Brian J. Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? On Monday 16 June 2003 20:16, George Woltman wrote: > > I'm also adding code to 23.5 to check EVERY iteration for an impossible > result such as -2, -1, 0, 1, 2. This test will be very, very quick. Sounds sensible to me ... but, does it not make sense to run this test during those iterations when testing for excess roundoff error occurs, i.e. normally only 1 in 128 iterations? The point here is that, once the residual gets into a loop like this, it won't get out again. > > FYI, six times a result of 0000000000000002 has been reported to the > server. So, somehow or another it is possible for a hardware error to > zero the FFT data without triggering an ILLEGAL SUMOUT. > Any instances of FFFFFFFFFFFFFFFE? Should be about as common, if this is a hardware related problem. There are lots of reasons why memory corruption may occur but, in most cases, it is hard to see how a whole block of data should be filled with zero (or one) bits without corrupting the program code in such a way that the program would crash or have the operating system crash from under it. I think the most likely scenario would be that a pointer to the FFT work space could be corrupted & point to virtual memory which has "zero on demand" attribute. This might be detectable by memory leak even on systems without proper memory protection (Win 9x) but could be fixed easily enough by keeping _two_ pointers to critical work space (the values don't change that often...) & comparing them occasionally. As to why the pointer might get corrupted, most likely we're looking at stack overflow or some other program behaving badly rather than a bug internal to Prime95. It would be interesting (though probably impossible) to check which OS the "residue 2" runs were run on. If my logic is right then I would suspect that they were all run on Win 9x/ME rather than NT, W2K, XP or any of the varieties of linux, where proper memory protection should give much better protection against this sort of problem. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 17 Jun 2003 06:45:09 +0000 From: "Brian J. Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: M#40 - what went wrong? On Monday 16 June 2003 20:16, George Woltman wrote: > > I'm also adding code to 23.5 to check EVERY iteration for an impossible > result such as -2, -1, 0, 1, 2. This test will be very, very quick. Sounds sensible to me ... but, does it not make sense to run this test during those iterations when testing for excess roundoff error occurs, i.e. normally only 1 in 128 iterations? The point here is that, once the residual gets into a loop like this, it won't get out again. > > FYI, six times a result of 0000000000000002 has been reported to the > server. So, somehow or another it is possible for a hardware error to > zero the FFT data without triggering an ILLEGAL SUMOUT. > Any instances of FFFFFFFFFFFFFFFE? Should be about as common, if this is a hardware related problem. There are lots of reasons why memory corruption may occur but, in most cases, it is hard to see how a whole block of data should be filled with zero (or one) bits without corrupting the program code in such a way that the program would crash or have the operating system crash from under it. I think the most likely scenario would be that a pointer to the FFT work space could be corrupted & point to virtual memory which has "zero on demand" attribute. This might be detectable by memory leak even on systems without proper memory protection (Win 9x) but could be fixed easily enough by keeping _two_ pointers to critical work space (the values don't change that often...) & comparing them occasionally. As to why the pointer might get corrupted, most likely we're looking at stack overflow or some other program behaving badly rather than a bug internal to Prime95. It would be interesting (though probably impossible) to check which OS the "residue 2" runs were run on. If my logic is right then I would suspect that they were all run on Win 9x/ME rather than NT, W2K, XP or any of the varieties of linux, where proper memory protection should give much better protection against this sort of problem. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 17 Jun 2003 04:34:19 -0700 From: "Paul Leyland" <[EMAIL PROTECTED]> Subject: RE: Mersenne: P-1, N-1 tests Wojciech Florek (WsF) asks: > The second question I should send directly to Chris Caldwell. > In his Prime > Pages the N-1 test of primality needs such a that a^(N-1)=1 mod N and > a^(N-1)/q is not 1 mod N (q is a factor of N-1). Methods for > determining > the number a are not presented. Are there any such methods? The method used in practice is to choose successive integers >= 2 until one is found to work. Proof that a small value of a exists seems to require the ERH. Paul _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 17 Jun 2003 12:06:00 EDT From: [EMAIL PROTECTED] Subject: Mersenne: Re: M#40 - what went wrong? - --part1_27.42870a8e.2c209668_boundary Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Luke Welsh wrote: >What about corruption of any of the pre-computed arrays? George Woltman wrote: >Yes, if the first weight value was NaN, then that would corrupt the entire >FFT data array. Of course, this generates an ILLEGAL SUMOUT on >the next iteration, but if this single corruption happened during the >last iteration then a false positive would have been generated. Luke, see the discussion about this on the Mersenne Forum: http://www.mersenneforum.org/viewtopic.php?t=696&postdays=0&postorder=asc& start=0 I believe it's more likely that the small array of DWT weights got zeroed at some point during the run, and stayed that way - that would cause the entire residue vector to go to zero, but without triggering a SUMOUT error (George conformed to me that the SUMOUT checksum is calculated on the raw inverse FFT outputs, i.e. before the multiple by the inverse DWT weights and rounding occurs) or a RO error warning, and there would be no NaNs involved, i.e. this could happen on any iteration (not just the last one, as in George's proposed NaN scenario) without raising any alarm flags. And if George's code (like mine) does the subtract-2 at the same time it's doing the IDWT muls and rounding, the -2 would get multipled by the same corrupted-to-zero DWT weight, hence we'd get all zeros for all subsequent iterations. George is adding code as we speak to guard against this scenario occurring in the future. - -Ernst - --part1_27.42870a8e.2c209668_boundary Content-Type: text/html; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable <HTML><FONT FACE=3Darial,helvetica><FONT SIZE=3D2 FAMILY=3D"SANSSERIF" FACE= =3D"Arial" LANG=3D"0">Luke Welsh wrote:<BR> <BR> >What about corruption of any of the pre-computed arrays?<BR> <BR> George Woltman wrote:<BR> <BR> >Yes, if the first weight value was NaN, then that would corrupt the enti= re<BR> >FFT data array. Of course, this generates an ILLEGAL SUMOUT on<BR> >the next iteration, but if this single corruption happened during the<BR= > >last iteration then a false positive would have been generated.<BR> <BR> Luke, see the discussion about this on the Mersenne Forum:<BR> <BR> http://www.mersenneforum.org/viewtopic.php?t=3D696&postdays=3D0&post= order=3Dasc&start=3D0<BR> <BR> I believe it's more likely that the small array of DWT weights<BR> got zeroed at some point during the run, and stayed that way -<BR> that would cause the entire residue vector to go to zero, but<BR> without triggering a SUMOUT error (George conformed to me that<BR> the SUMOUT checksum is calculated on the raw inverse FFT outputs,<BR> i.e. before the multiple by the inverse DWT weights and rounding<BR> occurs) or a RO error warning, and there would be no NaNs involved,<BR> i.e. this could happen on any iteration (not just the last one,<BR> as in George's proposed NaN scenario) without raising any alarm<BR> flags. And if George's code (like mine) does the subtract-2 at<BR> the same time it's doing the IDWT muls and rounding, the -2 would<BR> get multipled by the same corrupted-to-zero DWT weight, hence we'd<BR> get all zeros for all subsequent iterations. George is adding code<BR> as we speak to guard against this scenario occurring in the future.<BR> <BR> - -Ernst<BR> </FONT></HTML> - --part1_27.42870a8e.2c209668_boundary-- _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Thu, 19 Jun 2003 16:57:39 -0700 From: Max <[EMAIL PROTECTED]> Subject: Mersenne: fingerprint of computation Hello! Why not attach to a critical result a kind of computation fingerprint? Say, compute parity/CRC-8/CRC-32/(whatever *fast* checksum, eg. first and last word) of a residue each 1/100/10000/(whatever seems reasonable) iterations, but send the checksums to a server only when claiming a prime. That would definitely safe verification time in case of an error, and allow to locate a block of iterations where the error appeared. Max _________________________________________________________________________ Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #1076 *******************************
