There are indeed two issues here, but I'd categorise them differently. The SETI -9 tasks are really difficult, because the insane science application produces outputs which appear to the BOINC Client to be plausible. It's only much, much further down the line that the failure to validate exposes the error. I think SETI may have to wrestle with this one on its own.
But I think Maureen is talking about other projects, where the problem is indeed one of crashes and abnormal (non-zero status) exits which the BOINC Client *does* interpret as a computation error. Part of the trouble here is that the BOINC message log (without special logging flags) tends only to mention 'Output file absent'. It takes quite sophisticated knowledge of BOINC to understand that 'Output file absent' almost invariably means that the science application had previously crashed. I've written that repeatedly on project message boards (including BOINC's own message board): I could have sworn I'd also written about it quite recently on one of these BOINC mailing lists, but none of my search tools is turning up the reference. I think it would help if the BOINC client message log could actually use the word 'error', as in "Computation for task xxxxxxxxxxxxxxx finished with an error "Exit status 3 (0x3) for task xxxxxxxxxxxxxxx" BoincView can log them - why not BOINC itself? ----- Original Message ----- From: "Lynn W. Taylor" <[email protected]> To: <[email protected]> Sent: Tuesday, May 25, 2010 8:24 PM Subject: Re: [boinc_dev] host punishment mechanism revisited > There are two big issues here. > > First, we aren't really talking about work that has "crashed" -- we may > be able to tell that the work unit finished early, but work like the > SETI -9 results didn't "crash" they just had a lot of signals. > > Run time isn't necessarily an indicator of quality. > > What we're talking about is work that does not validate -- where the > result is compared to work done on another machine, and the results > don't match. > > Notice has to get from the validator, travel by some means to the > eyeballs of someone who cares about that machine, and register with > their Mark I mod 0 brain. > > The two issues are: > > What if the back-channel to the user is not available (old E-Mail > address, or not running BOINCMGR)? > > What if the user is (obscure reference to DNA, since this is Towel Day) > missing, presumed fed? > > There is probably an argument that BOINC should shut down gracefully if > the machine owner doesn't verify his continued existence periodically. > > On 5/25/2010 12:07 PM, Maureen Vilar wrote: > >> When computation finishes prematurely would it be possible to add to the >> messages something like: 'This task crashed'? And even 'Ask for advice on >> the project forum'? > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
