Last night, one of the 2 mprime jobs I run on my Linux PC at work died. It apprently died due to an illegal sumout error. Now these are quite common, an normally appear in my results.txt file in the form
Iteration: 1019208/10199069, ERROR: ILLEGAL SUMOUT Possible hardware failure, consult the readme file. Continuing from last save file. The one last night, OTOH, was of the form ERROR: ILLEGAL SUMOUT Possible hardware failure, consult the readme file. i.e. no iteration number, and no "Continuing from last save file." message following it - it just died at this point. There was also a file write error about 2 hours before the crash, this turned out to be due to a full user partition on my hard drive (which I've since fixed) and I don't know if it has anything to do with the sumout errors (it seemes they should not be related.) Here is the excerpt from the results file - you can see the first file write error on 10/24 around 16:35, then a checksum error at 17:45 which apparently was recovered from OK, then at 18:30 the ILLEGAL SUMOUT error which caused the crash. That is followed on 10/25 (i.e. after I came to work today) by two "FATAL ERROR: Writing to temp file." messages, as I twice tried to restart, before realizing my disk was full. After clearing out a couple hundred MB I again tried to restart at 9:47, but again got a "ERROR: ILLEGAL SUMOUT" message. Has one of my CPUs gone flaky on me? Is it possible that both of the 2 savefiles (they're both there, and both of the proper size) are corrupt? Any help would be welcome, -Ernst Excerpt from results.txt file: [Wed Oct 24 16:24:51 2001] Iteration 7736000 / 12962641 [Wed Oct 24 16:34:54 2001] Iteration 7738000 / 12962641 Error writing intermediate file: rC962641 [Wed Oct 24 16:45:00 2001] Iteration 7740000 / 12962641 [Wed Oct 24 16:55:04 2001] Iteration 7742000 / 12962641 [Wed Oct 24 17:05:04 2001] Iteration 7744000 / 12962641 [Wed Oct 24 17:15:04 2001] Iteration 7746000 / 12962641 [Wed Oct 24 17:24:59 2001] Iteration 7748000 / 12962641 [Wed Oct 24 17:34:53 2001] Iteration 7750000 / 12962641 [Wed Oct 24 17:44:46 2001] Iteration 7752000 / 12962641 Iteration: 7752549/12962641, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1524171845291614 != 283103064441664.6 Possible hardware failure, consult the readme file. Continuing from last save file. [Wed Oct 24 18:29:37 2001] ERROR: ILLEGAL SUMOUT Possible hardware failure, consult the readme file. [Thu Oct 25 09:21:07 2001] FATAL ERROR: Writing to temp file. [Thu Oct 25 09:28:29 2001] FATAL ERROR: Writing to temp file. [Thu Oct 25 09:47:40 2001] ERROR: ILLEGAL SUMOUT Possible hardware failure, consult the readme file. _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers
