David, I would suggest that there is three states for a host:
1) It has proven to be reliable 2) We are suspicious of it 3) We know it has a problem If it is in state #1, then it should be allowed to compute without limit If it is in state #2, then it should have a limit on the max number of results it can have in progress for the app version If it is in state #3, then it should have a limit on the max number of results for the day that it can process There would be a parameter that would be something like <host_app_version_limit>X<host_app_version_limit>. Each host has a host_app_version value 'Y'. Y is initialized for a new host to 50% of X When a success result is returned Y is incremented by one until it reaches X When a error result is returned Y is decremented by one until it reaches X When Y == X, then the host has no restrictions based on this value for the # of results in progress it can have for app version When Y == 1, then the host can only have one result per day for the app version When Y < X && Y > 1 then the host can have Y results in progress per processing unit for the app_version This mechanism will allow a computer to run an unlimited amount of work per day as long as Y is greater than 1. It just can't build up a large cache unless Y==X. This mechanism also should have a limited impact on the database as most computers will be at either Y == 1 or Y == X so there will be few queries to see how many results in progress there are. Kevin Reed From: David Anderson <[email protected]> To: BOINC Developers Mailing List <[email protected]> Date: 05/24/2010 02:03 PM Subject: host punishment mechanism revisited The BOINC scheduler has a mechanism called "host punishment" designed to deal with hosts that request an infinite sequence of jobs, and either error out on them or never return them. It works like this: there's a project parameter called "daily result quota", say 100. Every host starts off with this quota. If it returns an error or times out, the quota is decremented down to, but not below, 1. If it returns a valid result, the quota is doubled. The idea is that faulty hosts are given 1 job per day to see if they've been fixed. Recently this mechanism was changed from per-project to per-app-version, the idea being that a host might be erroring out on a GPU version but not the CPU version. However, the basic mechanism is somewhat flawed: 1) What if a fast host can do more than 100 jobs a day? We could increase the default quota, but that would let bad hosts trash that many more jobs. 2) It takes too long for a fixed host to ramp up its quota. The bottom line: as long as a host is sending correct results, it shouldn't have a daily quota at all. --------- If anyone has ideas for how to change the host punishment mechanism, please let me know. I'll think about it and post a proposal at some point. -- David
<<inline: graycol.gif>>
_______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
