Mersenne Digest V1 #1076

Mersenne Digest Thu, 19 Jun 2003 17:22:04 -0700

Mersenne Digest        Thursday, June 19 2003        Volume 01 : Number 1076

----------------------------------------------------------------------

Date: Mon, 16 Jun 2003 17:08:57 -0400
From: Bill Daly <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

Occam's Razor suggests that the original report was faked. Until this 
possibility can be absolutely eliminated, it seems to me a waste of time to 
try to construct elaborate and complicated scenarios which might lead to a 
false positive.

- -- 

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
Anti-virus detection software for the presence of computer viruses.

**********************************************************************

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 16 Jun 2003 19:58:06 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

At 05:08 PM 6/16/2003 -0400, Bill Daly wrote:

>Occam's Razor suggests that the original report was faked. Until this 
>possibility can be absolutely eliminated,

It can't be absolutely eliminated.  However, there are many indications
that this was not a hacking attempt.  The user has been at this 5 years.
His emails we're genuine.  He made no attempt at gaining publicity.
He's more my age (old) rather than the typical hacker age (young).
If this was a hacking attempt, why create a results.txt file showing 5 errors
during the run?

In the end, it doesn't really matter.  Even if this we're a hacking attempt,
tightening up the code to reduce the chance of a false positive is a good thing.

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Jun 2003 00:44:23 +0000
From: "Elias Daher" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

>I'm also adding code to 23.5 to check EVERY iteration for an impossible 
>result
>such as -2, -1, 0, 1, 2.  This test will be very, very quick.

But no matter how quick this additional test will be, it will slow down all 
machines globally, while a false positive report would only "slow down" two 
machines to double check for a limited time!!!
If we consider a double check will take 15 days on an average machine, and 
15 days (in the same time) on another machine, this makes 30 days x 24 x 
3,600 = 2,592,000 seconds.
While if we consider that this additional test takes only 3 minutes per 
exponent overall in average, there are over 40,000 machines working in the 
project, I don't know the exact number but if we consider half that number 
work on LL tests, 20,000 x 180 = 3,600,000 seconds. (There's a lot more 
machines assigned LL tests!)
So this makes a difference of 1,008,000 seconds and this of course just in 
the case a false positive was reported!!!
So I think the additional test is not needed, it would only slow down the 
project...

_________________________________________________________________
Help STOP SPAM with the new MSN 8 and get 2 months FREE*  
http://join.msn.com/?page=features/junkmail

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 16 Jun 2003 19:18:05 -0700
From: Luke Welsh <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

At 12:44 AM 6/17/03 +0000, Elias Daher wrote:
>(quoting George)
>>I'm also adding code to 23.5 to check EVERY iteration for an impossible 
>>result
>>such as -2, -1, 0, 1, 2.  This test will be very, very quick.

>But no matter how quick this additional test will be, it will slow down all 
>machines globally
[...]
>While if we consider that this additional test takes only 3 minutes per 
>exponent overall in average

It will probably take less than 0.3 seconds per exponent overall because
it will very seldom require checking any value other than the low order
member.  After each iteration check:

if ( x(0) >= -2.0 AND x(0) <= 2.0 ) then
// this code is rarely encountered and can be inefficient
   [check the rest of x(1:n-1)]
end if

- --Luke

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 16 Jun 2003 19:25:51 -0700
From: Luke Welsh <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

At 04:16 PM 6/16/03 -0400, George Woltman wrote:
>At 07:52 PM 6/15/2003 -0700, Luke Welsh wrote:
>>I'm fishing....

I'm still fishing.

Is it possible to iterate improperly?  If so, could you detect that the code 
is executing too quickly?

>FYI, six times a result of 0000000000000002 has been reported to the
>server.  So, somehow or another it is possible for a hardware error to
>zero the FFT data without triggering an ILLEGAL SUMOUT.

A new Pentium bug? (Athlon bug?) Any commonality in CPU types or chip sets?

- --Luke

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 16 Jun 2003 22:25:15 -0400
From: George Woltman <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

At 12:44 AM 6/17/2003 +0000, Elias Daher wrote:
>no matter how quick this additional test will be, it will slow down all 
>machines globally,
>if we consider that this additional test takes only 3 minutes per exponent 
>overall in average, ...

The test will take at most 300 clock cycles (the time to read a double from
main memory).  20,000,000 iterations * 300 clocks / 1 GHz cpu speed will 
add only 6 seconds to the average LL test.  This is 30 times better than 
your 3
minutes estimate.

I think adding 6 seconds or even 3 minutes per LL test will have negligible
impact on GIMPS throughput.  Your analysis is correct, but maybe there are
other small, unquantifiable factors at work.  How about the possible cost of
a second false alarm causing 3 people to drop out of GIMPS saying
"these guys are bozos that don't learn from their mistakes"?

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Jun 2003 06:45:09 +0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

On Monday 16 June 2003 20:16, George Woltman wrote:
>
> I'm also adding code to 23.5 to check EVERY iteration for an impossible
> result such as -2, -1, 0, 1, 2.  This test will be very, very quick.

Sounds sensible to me ... but, does it not make sense to run this test during 
those iterations when testing for excess roundoff error occurs, i.e. normally 
only 1 in 128 iterations? The point here is that, once the residual gets into 
a loop like this, it won't get out again.
>
> FYI, six times a result of 0000000000000002 has been reported to the
> server.  So, somehow or another it is possible for a hardware error to
> zero the FFT data without triggering an ILLEGAL SUMOUT.
>
Any instances of FFFFFFFFFFFFFFFE? Should be about as common, if this is a 
hardware related problem.

There are lots of reasons why memory corruption may occur but, in most cases, 
it is hard to see how a whole block of data should be filled with zero (or 
one) bits without corrupting the program code in such a way that the program 
would crash or have the operating system crash from under it. I think the 
most likely scenario would be that a pointer to the FFT work space could be 
corrupted & point to virtual memory which has "zero on demand" attribute. 
This might be detectable by memory leak even on systems without proper memory 
protection (Win 9x) but could be fixed easily enough by keeping _two_ 
pointers to critical work space (the values don't change that often...) & 
comparing them occasionally. As to why the pointer might get corrupted, most 
likely we're looking at stack overflow or some other program behaving badly 
rather than a bug internal to Prime95.

It would be interesting (though probably impossible) to check which OS the 
"residue 2" runs were run on. If my logic is right then I would suspect that 
they were all run on Win 9x/ME rather than NT, W2K, XP or any of the 
varieties of linux, where proper memory protection should give much better 
protection against this sort of problem.

Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Jun 2003 06:45:09 +0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: M#40 - what went wrong?

On Monday 16 June 2003 20:16, George Woltman wrote:
>
> I'm also adding code to 23.5 to check EVERY iteration for an impossible
> result such as -2, -1, 0, 1, 2.  This test will be very, very quick.

Sounds sensible to me ... but, does it not make sense to run this test during 
those iterations when testing for excess roundoff error occurs, i.e. normally 
only 1 in 128 iterations? The point here is that, once the residual gets into 
a loop like this, it won't get out again.
>
> FYI, six times a result of 0000000000000002 has been reported to the
> server.  So, somehow or another it is possible for a hardware error to
> zero the FFT data without triggering an ILLEGAL SUMOUT.
>
Any instances of FFFFFFFFFFFFFFFE? Should be about as common, if this is a 
hardware related problem.

There are lots of reasons why memory corruption may occur but, in most cases, 
it is hard to see how a whole block of data should be filled with zero (or 
one) bits without corrupting the program code in such a way that the program 
would crash or have the operating system crash from under it. I think the 
most likely scenario would be that a pointer to the FFT work space could be 
corrupted & point to virtual memory which has "zero on demand" attribute. 
This might be detectable by memory leak even on systems without proper memory 
protection (Win 9x) but could be fixed easily enough by keeping _two_ 
pointers to critical work space (the values don't change that often...) & 
comparing them occasionally. As to why the pointer might get corrupted, most 
likely we're looking at stack overflow or some other program behaving badly 
rather than a bug internal to Prime95.

It would be interesting (though probably impossible) to check which OS the 
"residue 2" runs were run on. If my logic is right then I would suspect that 
they were all run on Win 9x/ME rather than NT, W2K, XP or any of the 
varieties of linux, where proper memory protection should give much better 
protection against this sort of problem.

Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Jun 2003 04:34:19 -0700
From: "Paul Leyland" <[EMAIL PROTECTED]>
Subject: RE: Mersenne: P-1, N-1 tests

Wojciech Florek (WsF) asks:

> The second question I should send directly to Chris Caldwell. 
> In his Prime
> Pages the N-1 test of primality needs such a that a^(N-1)=1 mod N and 
> a^(N-1)/q is not 1 mod N (q is a factor of N-1). Methods for 
> determining 
> the number a are not presented. Are there any such methods?

The method used in practice is to choose successive integers >= 2 until one is found 
to work.  Proof that a small value of a exists seems to require the ERH.

Paul
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 17 Jun 2003 12:06:00 EDT
From: [EMAIL PROTECTED]
Subject: Mersenne: Re: M#40 - what went wrong?

- --part1_27.42870a8e.2c209668_boundary
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Luke Welsh wrote:

>What about corruption of any of the pre-computed arrays?

George Woltman wrote:

>Yes, if the first weight value was NaN, then that would corrupt the entire
>FFT data array.  Of course, this generates an ILLEGAL SUMOUT on
>the next iteration, but if this single corruption happened during the
>last iteration then a false positive would have been generated.

Luke, see the discussion about this on the Mersenne Forum:

http://www.mersenneforum.org/viewtopic.php?t=696&postdays=0&postorder=asc&;
start=0

I believe it's more likely that the small array of DWT weights
got zeroed at some point during the run, and stayed that way -
that would cause the entire residue vector to go to zero, but
without triggering a SUMOUT error (George conformed to me that
the SUMOUT checksum is calculated on the raw inverse FFT outputs,
i.e. before the multiple by the inverse DWT weights and rounding
occurs) or a RO error warning, and there would be no NaNs involved,
i.e. this could happen on any iteration (not just the last one,
as in George's proposed NaN scenario) without raising any alarm
flags. And if George's code (like mine) does the subtract-2 at
the same time it's doing the IDWT muls and rounding, the -2 would
get multipled by the same corrupted-to-zero DWT weight, hence we'd
get all zeros for all subsequent iterations. George is adding code
as we speak to guard against this scenario occurring in the future.

- -Ernst

- --part1_27.42870a8e.2c209668_boundary
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

<HTML><FONT FACE=3Darial,helvetica><FONT  SIZE=3D2 FAMILY=3D"SANSSERIF" FACE=
=3D"Arial" LANG=3D"0">Luke Welsh wrote:<BR>
<BR>
&gt;What about corruption of any of the pre-computed arrays?<BR>
<BR>
George Woltman wrote:<BR>
<BR>
&gt;Yes, if the first weight value was NaN, then that would corrupt the enti=
re<BR>
&gt;FFT data array.&nbsp; Of course, this generates an ILLEGAL SUMOUT on<BR>
&gt;the next iteration, but if this single corruption happened during the<BR=
>
&gt;last iteration then a false positive would have been generated.<BR>
<BR>
Luke, see the discussion about this on the Mersenne Forum:<BR>
<BR>
http://www.mersenneforum.org/viewtopic.php?t=3D696&amp;postdays=3D0&amp;post=
order=3Dasc&amp;start=3D0<BR>
<BR>
I believe it's more likely that the small array of DWT weights<BR>
got zeroed at some point during the run, and stayed that way -<BR>
that would cause the entire residue vector to go to zero, but<BR>
without triggering a SUMOUT error (George conformed to me that<BR>
the SUMOUT checksum is calculated on the raw inverse FFT outputs,<BR>
i.e. before the multiple by the inverse DWT weights and rounding<BR>
occurs) or a RO error warning, and there would be no NaNs involved,<BR>
i.e. this could happen on any iteration (not just the last one,<BR>
as in George's proposed NaN scenario) without raising any alarm<BR>
flags. And if George's code (like mine) does the subtract-2 at<BR>
the same time it's doing the IDWT muls and rounding, the -2 would<BR>
get multipled by the same corrupted-to-zero DWT weight, hence we'd<BR>
get all zeros for all subsequent iterations. George is adding code<BR>
as we speak to guard against this scenario occurring in the future.<BR>
<BR>
- -Ernst<BR>
</FONT></HTML>
- --part1_27.42870a8e.2c209668_boundary--
_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Thu, 19 Jun 2003 16:57:39 -0700
From: Max <[EMAIL PROTECTED]>
Subject: Mersenne: fingerprint of computation

Hello!

Why not attach to a critical result a kind of computation fingerprint?
Say, compute parity/CRC-8/CRC-32/(whatever *fast* checksum, eg. first 
and last word) of a residue each 1/100/10000/(whatever seems reasonable) 
iterations, but send the checksums to a server only when claiming a prime.
That would definitely safe verification time in case of an error, and 
allow to locate a block of iterations where the error appeared.

Max

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #1076
*******************************

Mersenne Digest V1 #1076

Reply via email to