What if the servers knew some "average" performance for a given host, and the quota was, oh, 5 times the average over the past week (by application).
The quota could be adjusted by a factor using John's reputation value. That way a GPU glitch would have an upper limit. Your statement is a great example of why the quota shouldn't be removed completely, or even raised too high: we don't really want to "punish" anyone (and the word "punish" isn't literal, I know), but to stop a runaway problem from literally running away. -- Lynn On 5/24/2010 3:57 PM, Richard Haselgrove wrote: > I'm afraid this isn't going to work. > > (Please excuse us while we, yet again, discuss a SETI-specific problem) > > There's a frequently observed, but cause unknown, failure mode for the SETI > CUDA application. > > Once a CUDA card gets into this state, nothing seems to stop it except a > host reboot - i.e. manual intervention. > > The symptoms of the failure are that the CUDA card in question exits every > task as an immediate -9 overflow, generates the matching maximum-size upload > file, and reports the task as "success". At a conservative average of one > task per minute (actually it's quicker than that), the host could recycle > 1,500 tasks per day. > > The current local DCF will be falling, so more tasks will be requested than > returned: even an instananeous cut-off would leave a large cache to be blown > off. > > The early tasks, at least, will have a pretty fast turnround: it's likely > that their quorum partner won't have returned yet. When the first wingmate > does come back (possibly not for several days), validation will be > inconclusive: a third quorum member will be generated, queued, issued, > cached and eventually returned. That's the *first* moment at which the > pseudo -9 can be declared invalid. > > If we wait for validation failures to start to punish hosts with this class > of failure, it'll be far too slow. The low basic quota, with replacement for > successful validations, seems likely to be a better protaction against > runaway hosts. > > >> The goal is to not feed work to "broken" CPUs or GPUs. >> >> What about this: >> >> Each day, keep track of validations. If a work unit validates, and no >> invalid work is received, raise quota. If no work is validated that >> day, quota stays unchanged. If no valid work is received, and invalid >> work is received, leave the quota unchanged. >> >> I think that mixed results for a day (some valid, some invalid) should >> leave the quota unchanged. >> >> That would keep a long string of SETI -9's from killing the quota -- >> it'd take days of sustained badness to stop a broken host. >> >> -- Lynn >> >> On 5/24/2010 2:55 PM, Richard Haselgrove wrote: >>>>> Allowing a 'bonus' on quota for a validated task gets round the >>>>> astronomical >>>>> numbers that can be processed by "successful, but idiotic" reports such >>>>> as >>>>> SETI overflows on faulty GPUs. >>>> >>>> Such results will be returned, but NOT validated. So they don't recive >>>> "validation bonus". >>> >>> Exactly. that's why it gets round - i.e. solves or avoids - the problem >>> that >>> could be caused by an inflated general quota figure. >>> >>> >>> _______________________________________________ >>> boinc_dev mailing list >>> [email protected] >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>> To unsubscribe, visit the above URL and >>> (near bottom of page) enter your email address. >>> >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. >> > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
