I did a task check on one of the double-fermi hosts this morning, and got on to
the offset=1920 page for 24 hours. So the problem can be a bit bigger than 800
tasks/gpu/day.
----- Original Message -----
From: Josef W. Segur
Although the quota system is not limiting those hosts enough,
they get frequent invalidations. The consecutive valid count is
usually zero, though I've seen a few cases of single digit
non-zero counts. So with s...@h gpu_multiplier set at 8, those hosts
are limited to not much more than 800 tasks per day per FERMI
GPU. My most recent check shows 19 hosts, though a few have two
Fermi cards so the GPU count is about 23. So the size of the
problem is reduced to under 2% of s...@h Enhanced tasks.
s...@h Enhanced tasks account for about 64% of the download
bandwidth, the much larger Astropulse v505 tasks get the
remainder. That makes the bandwidth impact on the order of 1%
currently.
Perhaps something like keeping a consecutive invalid count and
not doubling on reported "Success" when that count exceeds the
consecutive valid count would be better. But that would make
recovery from a temporary problem much slower.
--
Joe
On Fri, 07 Jan 2011 06:41:07 -0500, Richard Haselgrove wrote:
> OK, back-of-envelope error - that 10% overstates the size of the problem.
>
> But, more realistically, 20 hosts @ 2,000 tasks wasted each per day is
close to 4% of all multibeam tasks issued.
>
> ----- Original Message -----
> From: Richard Haselgrove
>
>
> There is an updated version of that list at
http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1062822
>
> Given the limited number of hosts involved, in the short term the wastage
(something of the order of 10% of SETI's limited download bandwidth), maybe
could/should be stemmed by invoking
http://boinc.berkeley.edu/trac/wiki/BlackList for the 19 hosts identified.
>
> This approach has been used successfully by Milo at CPDN, although there
is a suggestion that something (perhaps the replacement of max_results_day with
per_app_version equivalents) broke the blacklist facility after he started
using it - quotas which had been set to -1 manually became positive again. The
SETI situation would be a useful test of this tool while a longer-term
automatic solution is sought.
>
> ----- Original Message -----
> From: Raistmer
>
>
> Hello.
>
> Looks like current quota system implementation can't prevent project
resources waste in case of "partially" broken host.
> For example, host with anonymous platform running FERMI incompatible
CUDA app on SETI.
> It will produce incorrect overflows almost always but few specific ARs
that will be processed correctly and recive validation.
> This small nomber of validations + "GPU" status of app (GPU has greatly
relaxed limits) allows continuous task trashing. Current quota system
implementation can't prevent massive task trashing in this situation.
>
> But now more historical info about host behavior stored on servers, on
per app version basis.
> Maybe smth new can be implemented that will take into account not only
last successive validation but host history too?
> The testcases are known, SETI community has list of such bad-behaving
hosts already:
http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1061788
>
> The aim should be to reduce their throughput to 1 task per day for NV
GPU app until their owners reinstall GPU app.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.