What if the servers knew some "average" performance for a given host, 
and the quota was, oh, 5 times the average over the past week (by 
application).

The quota could be adjusted by a factor using John's reputation value.

That way a GPU glitch would have an upper limit.

Your statement is a great example of why the quota shouldn't be removed 
completely, or even raised too high: we don't really want to "punish" 
anyone (and the word "punish" isn't literal, I know), but to stop a 
runaway problem from literally running away.

-- Lynn

On 5/24/2010 3:57 PM, Richard Haselgrove wrote:
> I'm afraid this isn't going to work.
>
> (Please excuse us while we, yet again, discuss a SETI-specific problem)
>
> There's a frequently observed, but cause unknown, failure mode for the SETI
> CUDA application.
>
> Once a CUDA card gets into this state, nothing seems to stop it except a
> host reboot - i.e. manual intervention.
>
> The symptoms of the failure are that the CUDA card in question exits every
> task as an immediate -9 overflow, generates the matching maximum-size upload
> file, and reports the task as "success". At a conservative average of one
> task per minute (actually it's quicker than that), the host could recycle
> 1,500 tasks per day.
>
> The current local DCF will be falling, so more tasks will be requested than
> returned: even an instananeous cut-off would leave a large cache to be blown
> off.
>
> The early tasks, at least, will have a pretty fast turnround: it's likely
> that their quorum partner won't have returned yet. When the first wingmate
> does come back (possibly not for several days), validation will be
> inconclusive: a third quorum member will be generated, queued, issued,
> cached and eventually returned. That's the *first* moment at which the
> pseudo -9 can be declared invalid.
>
> If we wait for validation failures to start to punish hosts with this class
> of failure, it'll be far too slow. The low basic quota, with replacement for
> successful validations, seems likely to be a better protaction against
> runaway hosts.
>
>
>> The goal is to not feed work to "broken" CPUs or GPUs.
>>
>> What about this:
>>
>> Each day, keep track of validations.  If a work unit validates, and no
>> invalid work is received, raise quota.  If no work is validated that
>> day, quota stays unchanged.  If no valid work is received, and invalid
>> work is received, leave the quota unchanged.
>>
>> I think that mixed results for a day (some valid, some invalid) should
>> leave the quota unchanged.
>>
>> That would keep a long string of SETI -9's from killing the quota --
>> it'd take days of sustained badness to stop a broken host.
>>
>> -- Lynn
>>
>> On 5/24/2010 2:55 PM, Richard Haselgrove wrote:
>>>>> Allowing a 'bonus' on quota for a validated task gets round the
>>>>> astronomical
>>>>> numbers that can be processed by "successful, but idiotic" reports such
>>>>> as
>>>>> SETI overflows on faulty GPUs.
>>>>
>>>> Such results will be returned, but NOT validated. So they don't recive
>>>> "validation bonus".
>>>
>>> Exactly. that's why it gets round - i.e. solves or avoids - the problem
>>> that
>>> could be caused by an inflated general quota figure.
>>>
>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>>
>
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to