Then there may be as many as four cases. But my second issue is, IMHO, the bigger issue.
We aren't talking about professionally maintained machines in a controlled, data center environment, audited periodically for accuracy. We're talking about random hardware operated by anything from that kind of ideal setting to "Mom's computer" whose tech-savvy kid is off to college, and may not even know that their computer "goes BOINC." When you say "Why can't BOINCMGR display this" it doesn't address the machine where the operator doesn't even know they should look. That's the machine that needs to be throttled, and it needs help because there is no one monitoring it. I'm not saying what we have is good enough, but I do suspect that the people who pay attention are also watching their credit, and would likely catch problems no matter what is done. ... and that we're going to have to do something pretty dramatic (fireworks? send a .wav file "call for help" to the speakers??) to get some people's attention. -- Lynn On 5/25/2010 1:20 PM, Richard Haselgrove wrote: > There are indeed two issues here, but I'd categorise them differently. > > The SETI -9 tasks are really difficult, because the insane science > application produces outputs which appear to the BOINC Client to be > plausible. It's only much, much further down the line that the failure to > validate exposes the error. I think SETI may have to wrestle with this one > on its own. > > But I think Maureen is talking about other projects, where the problem is > indeed one of crashes and abnormal (non-zero status) exits which the BOINC > Client *does* interpret as a computation error. > > Part of the trouble here is that the BOINC message log (without special > logging flags) tends only to mention 'Output file absent'. It takes quite > sophisticated knowledge of BOINC to understand that 'Output file absent' > almost invariably means that the science application had previously crashed. > I've written that repeatedly on project message boards (including BOINC's > own message board): I could have sworn I'd also written about it quite > recently on one of these BOINC mailing lists, but none of my search tools is > turning up the reference. > > I think it would help if the BOINC client message log could actually use the > word 'error', as in > > "Computation for task xxxxxxxxxxxxxxx finished with an error > "Exit status 3 (0x3) for task xxxxxxxxxxxxxxx" > > BoincView can log them - why not BOINC itself? > > ----- Original Message ----- > From: "Lynn W. Taylor"<[email protected]> > To:<[email protected]> > Sent: Tuesday, May 25, 2010 8:24 PM > Subject: Re: [boinc_dev] host punishment mechanism revisited > > >> There are two big issues here. >> >> First, we aren't really talking about work that has "crashed" -- we may >> be able to tell that the work unit finished early, but work like the >> SETI -9 results didn't "crash" they just had a lot of signals. >> >> Run time isn't necessarily an indicator of quality. >> >> What we're talking about is work that does not validate -- where the >> result is compared to work done on another machine, and the results >> don't match. >> >> Notice has to get from the validator, travel by some means to the >> eyeballs of someone who cares about that machine, and register with >> their Mark I mod 0 brain. >> >> The two issues are: >> >> What if the back-channel to the user is not available (old E-Mail >> address, or not running BOINCMGR)? >> >> What if the user is (obscure reference to DNA, since this is Towel Day) >> missing, presumed fed? >> >> There is probably an argument that BOINC should shut down gracefully if >> the machine owner doesn't verify his continued existence periodically. >> >> On 5/25/2010 12:07 PM, Maureen Vilar wrote: >> >>> When computation finishes prematurely would it be possible to add to the >>> messages something like: 'This task crashed'? And even 'Ask for advice on >>> the project forum'? >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. >> > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
