Last night, one of the 2 mprime jobs I run on my Linux
PC at work died. It apprently died due to an illegal
sumout error. Now these are quite common, an normally
appear in my results.txt file in the form

Iteration: 1019208/10199069, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme file.
Continuing from last save file.

The one last night, OTOH, was of the form

ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme file.

i.e. no iteration number, and no "Continuing from last 
save file." message following it - it just died at this
point. There was also a file write error about 2 hours
before the crash, this turned out to be due to a full
user partition on my hard drive (which I've since fixed)
and I don't know if it has anything to do with the 
sumout errors (it seemes they should not be related.)

Here is the excerpt from the results file - you can see
the first file write error on 10/24 around 16:35, then a 
checksum error at 17:45 which apparently was recovered 
from OK, then at 18:30 the ILLEGAL SUMOUT error which 
caused the crash. That is followed on 10/25 (i.e. after
I came to work today) by two "FATAL ERROR: Writing to temp file." messages, as I twice 
tried to restart,
before realizing my disk was full. After clearing out
a couple hundred MB I again tried to restart at 9:47,
but again got a "ERROR: ILLEGAL SUMOUT" message.

Has one of my CPUs gone flaky on me? Is it possible
that both of the 2 savefiles (they're both there, and
both of the proper size) are corrupt?

Any help would be welcome,

-Ernst

Excerpt from results.txt file:

[Wed Oct 24 16:24:51 2001]
Iteration 7736000 / 12962641
[Wed Oct 24 16:34:54 2001]
Iteration 7738000 / 12962641
Error writing intermediate file: rC962641
[Wed Oct 24 16:45:00 2001]
Iteration 7740000 / 12962641
[Wed Oct 24 16:55:04 2001]
Iteration 7742000 / 12962641
[Wed Oct 24 17:05:04 2001]
Iteration 7744000 / 12962641
[Wed Oct 24 17:15:04 2001]
Iteration 7746000 / 12962641
[Wed Oct 24 17:24:59 2001]
Iteration 7748000 / 12962641
[Wed Oct 24 17:34:53 2001]
Iteration 7750000 / 12962641
[Wed Oct 24 17:44:46 2001]
Iteration 7752000 / 12962641
Iteration: 7752549/12962641, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1524171845291614 != 
283103064441664.6
Possible hardware failure, consult the readme file.
Continuing from last save file.
[Wed Oct 24 18:29:37 2001]
ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme file.
[Thu Oct 25 09:21:07 2001]
FATAL ERROR: Writing to temp file.
[Thu Oct 25 09:28:29 2001]
FATAL ERROR: Writing to temp file.
[Thu Oct 25 09:47:40 2001]
ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme file.


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to