I've never noticed point (2) to be a problem.Successful returns increase the quota for the *current* day, not for subsequent days: so, in the worst case scenario (unless the doubling algorithm has been changed with the new server code):
The user only notices the problem after quota has already dropped to 1 per day, and far enough into the day that the day's single job has already been wasted. In this case, no new work can be fetched until after the server's midnight, even to test whether the user's fix has been successful. That can be the most frustrating part. After midnight, one new task can be fetched - per CPU core, in the current model. How does that scale with GPUs? I think it should continue to scale so that every machine resource can be supplied with the single daily 'test WU': if the host has four GPUs, it should be allowed four GPU-tasks. No further work is allowed until the first task has been reported as 'success'. Time is wasted if there's a long file upload to complete, but I don't think that can be avoided. But after the first task has reported, another task is immediately permitted to download. If the second task is also successful, quota becomes four: two have been used, so two can be downloaded - one to run immediately, one in reserve to start when #3 finishes. From that point forward, you're ahead of the game - no more time is wasted, and the doubling soon restores full service. On a multicore, it's even quicker: with four cores, by the end of the first test set of four (which always seem to finish at slightly different times), quota has doubled four times to 16 per core, or 64 for the computer as a whole: four completed, four in progress, already means 56 available and ready to run. That's plenty. I agree with Lynn's point (often suggested) that the starting point for newly-attached hosts should be set much lower than the ultimate limit achievable by reliable hosts. Set a low starting 'probationary' quota - even as low as two per core, for "1 running and one spare to follow" - and allow it to double as now. The maximum value that quota can reach should be determined by each project. 100 is already too low for SETI GPUs, but ludicrously high for CPDN. I think we've already covered the point that it should be variable by application (AQUA FP take an hour, need quota at least 50/day: AQUA IQ takes several days, quota 2/day is plenty). And that's without considering CPU/GPU versions of the same app. Allowing a 'bonus' on quota for a validated task gets round the astronomical numbers that can be processed by "successful, but idiotic" reports such as SETI overflows on faulty GPUs. But it suffers from the asynchronous nature of validation: why should I deserve a bonus task today, if my oldest pending task - returned 8 February - happens to be validated by its fifth potential wingmate? > The BOINC scheduler has a mechanism called "host punishment" > designed to deal with hosts that request an infinite sequence of jobs, > and either error out on them or never return them. > > It works like this: there's a project parameter called "daily result > quota", > say 100. Every host starts off with this quota. > If it returns an error or times out, the quota is decremented down to, > but not below, 1. If it returns a valid result, the quota is doubled. > The idea is that faulty hosts are given 1 job per day > to see if they've been fixed. > > Recently this mechanism was changed from per-project to per-app-version, > the idea being that a host might be erroring out on a GPU version > but not the CPU version. > > However, the basic mechanism is somewhat flawed: > > 1) What if a fast host can do more than 100 jobs a day? > We could increase the default quota, but that would let bad hosts > trash that many more jobs. > > 2) It takes too long for a fixed host to ramp up its quota. > > The bottom line: as long as a host is sending correct results, > it shouldn't have a daily quota at all. > > --------- > > If anyone has ideas for how to change the host punishment mechanism, > please let me know. > I'll think about it and post a proposal at some point. > > -- David > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
