Re: [boinc_dev] host punishment mechanism revisited

Richard Haselgrove Mon, 24 May 2010 14:01:20 -0700

I've never noticed point (2) to be a problem.Successful returns increase the 
quota for the *current* day, not for subsequent days: so, in the worst case 
scenario (unless the doubling algorithm has been changed with the new server 
code):


The user only notices the problem after quota has already dropped to 1 per 
day, and far enough into the day that the day's single job has already been 
wasted. In this case, no new work can be fetched until after the server's 
midnight, even to test whether the user's fix has been successful. That can 
be the most frustrating part.

After midnight, one new task can be fetched - per CPU core, in the current 
model. How does that scale with GPUs? I think it should continue to scale so 
that every machine resource can be supplied with the single daily 'test WU': 
if the host has four GPUs, it should be allowed four GPU-tasks.

No further work is allowed until the first task has been reported as 
'success'. Time is wasted if there's a long file upload to complete, but I 
don't think that can be avoided. But after the first task has reported, 
another task is immediately permitted to download. If the second task is 
also successful, quota becomes four: two have been used, so two can be 
downloaded - one to run immediately, one in reserve to start when #3 
finishes. From that point forward, you're ahead of the game - no more time 
is wasted, and the doubling soon restores full service. On a multicore, it's 
even quicker: with four cores, by the end of the first test set of four 
(which always seem to finish at slightly different times), quota has doubled 
four times to 16 per core, or 64 for the computer as a whole: four 
completed, four in progress, already means 56 available and ready to run. 
That's plenty.

I agree with Lynn's point (often suggested) that the starting point for 
newly-attached hosts should be set much lower than the ultimate limit 
achievable by reliable hosts. Set a low starting 'probationary' quota - even 
as low as two per core, for "1 running and one spare to follow" - and allow 
it to double as now.

The maximum value that quota can reach should be determined by each project. 
100 is already too low for SETI GPUs, but ludicrously high for CPDN. I think 
we've already covered the point that it should be variable by application 
(AQUA FP take an hour, need quota at least 50/day: AQUA IQ takes several 
days, quota 2/day is plenty). And that's without considering CPU/GPU 
versions of the same app.

Allowing a 'bonus' on quota for a validated task gets round the astronomical 
numbers that can be processed by "successful, but idiotic" reports such as 
SETI overflows on faulty GPUs. But it suffers from the asynchronous nature 
of validation: why should I deserve a bonus task today, if my oldest pending 
task - returned 8 February - happens to be validated by its fifth potential 
wingmate?


> The BOINC scheduler has a mechanism called "host punishment"
> designed to deal with hosts that request an infinite sequence of jobs,
> and either error out on them or never return them.
>
> It works like this: there's a project parameter called "daily result 
> quota",
> say 100.  Every host starts off with this quota.
> If it returns an error or times out, the quota is decremented down to,
> but not below, 1.  If it returns a valid result, the quota is doubled.
> The idea is that faulty hosts are given 1 job per day
> to see if they've been fixed.
>
> Recently this mechanism was changed from per-project to per-app-version,
> the idea being that a host might be erroring out on a GPU version
> but not the CPU version.
>
> However, the basic mechanism is somewhat flawed:
>
> 1) What if a fast host can do more than 100 jobs a day?
> We could increase the default quota, but that would let bad hosts
> trash that many more jobs.
>
> 2) It takes too long for a fixed host to ramp up its quota.
>
> The bottom line: as long as a host is sending correct results,
> it shouldn't have a daily quota at all.
>
> ---------
>
> If anyone has ideas for how to change the host punishment mechanism,
> please let me know.
> I'll think about it and post a proposal at some point.
>
> -- David
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] host punishment mechanism revisited

Reply via email to