There are indeed two issues here, but I'd categorise them differently.

The SETI -9 tasks are really difficult, because the insane science
application produces outputs which appear to the BOINC Client to be
plausible. It's only much, much further down the line that the failure to
validate exposes the error. I think SETI may have to wrestle with this one
on its own.

But I think Maureen is talking about other projects, where the problem is
indeed one of crashes and abnormal (non-zero status) exits which the BOINC
Client *does* interpret as a computation error.

Part of the trouble here is that the BOINC message log (without special
logging flags) tends only to mention 'Output file absent'. It takes quite
sophisticated knowledge of BOINC to understand that 'Output file absent'
almost invariably means that the science application had previously crashed.
I've written that repeatedly on project message boards (including BOINC's
own message board): I could have sworn I'd also written about it quite
recently on one of these BOINC mailing lists, but none of my search tools is
turning up the reference.

I think it would help if the BOINC client message log could actually use the
word 'error', as in

"Computation for task xxxxxxxxxxxxxxx finished with an error
"Exit status 3 (0x3) for task xxxxxxxxxxxxxxx"

BoincView can log them - why not BOINC itself?

----- Original Message ----- 
From: "Lynn W. Taylor" <[email protected]>
To: <[email protected]>
Sent: Tuesday, May 25, 2010 8:24 PM
Subject: Re: [boinc_dev] host punishment mechanism revisited


> There are two big issues here.
>
> First, we aren't really talking about work that has "crashed" -- we may
> be able to tell that the work unit finished early, but work like the
> SETI -9 results didn't "crash" they just had a lot of signals.
>
> Run time isn't necessarily an indicator of quality.
>
> What we're talking about is work that does not validate -- where the
> result is compared to work done on another machine, and the results
> don't match.
>
> Notice has to get from the validator, travel by some means to the
> eyeballs of someone who cares about that machine, and register with
> their Mark I mod 0 brain.
>
> The two issues are:
>
> What if the back-channel to the user is not available (old E-Mail
> address, or not running BOINCMGR)?
>
> What if the user is (obscure reference to DNA, since this is Towel Day)
> missing, presumed fed?
>
> There is probably an argument that BOINC should shut down gracefully if
> the machine owner doesn't verify his continued existence periodically.
>
> On 5/25/2010 12:07 PM, Maureen Vilar wrote:
>
>> When computation finishes prematurely would it be possible to add to the
>> messages something like: 'This task crashed'? And even 'Ask for advice on
>> the project forum'?
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to