David,

I would suggest that there is three states for a host:

1) It has proven to be reliable
2) We are suspicious of it
3) We know it has a problem

If it is in state #1, then it should be allowed to compute without limit
If it is in state #2, then it should have a limit on the max number of
results it can have in progress for the app version
If it is in state #3, then it should have a limit on the max number of
results for the day that it can process

There would be a parameter that would be something like
<host_app_version_limit>X<host_app_version_limit>.  Each host has a
host_app_version value 'Y'.

Y is initialized for a new host to 50% of X
When a success result is returned Y is incremented by one until it reaches
X
When a error result is returned Y is decremented by one until it reaches X

When Y == X, then the host has no restrictions based on this value for the
# of results in progress it can have for app version
When Y == 1, then the host can only have one result per day for the app
version
When Y < X && Y > 1 then the host can have Y results in progress per
processing unit for the app_version

This mechanism will allow a computer to run an unlimited amount of work per
day as long as Y is greater than 1.  It just can't build up a large cache
unless Y==X.

This mechanism also should have a limited impact on the database as most
computers will be at either Y == 1 or Y == X so there will be few queries
to see how many results in progress there are.

Kevin Reed




From:   David Anderson <[email protected]>
To:     BOINC Developers Mailing List <[email protected]>
Date:   05/24/2010 02:03 PM
Subject:        host punishment mechanism revisited



The BOINC scheduler has a mechanism called "host punishment"
designed to deal with hosts that request an infinite sequence of jobs,
and either error out on them or never return them.

It works like this: there's a project parameter called "daily result
quota",
say 100.  Every host starts off with this quota.
If it returns an error or times out, the quota is decremented down to,
but not below, 1.  If it returns a valid result, the quota is doubled.
The idea is that faulty hosts are given 1 job per day
to see if they've been fixed.

Recently this mechanism was changed from per-project to per-app-version,
the idea being that a host might be erroring out on a GPU version
but not the CPU version.

However, the basic mechanism is somewhat flawed:

1) What if a fast host can do more than 100 jobs a day?
We could increase the default quota, but that would let bad hosts
trash that many more jobs.

2) It takes too long for a fixed host to ramp up its quota.

The bottom line: as long as a host is sending correct results,
it shouldn't have a daily quota at all.

---------

If anyone has ideas for how to change the host punishment mechanism,
please let me know.
I'll think about it and post a proposal at some point.

-- David

<<inline: graycol.gif>>

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to