Re: [boinc_dev] Quota system inefficiency, something better?

Raistmer Fri, 07 Jan 2011 13:22:53 -0800

This implies same quota mechanism as used today but just with different numbers.
For GPU idle time will be long enough to make recovering from single act of 
task trashing (it happens sometimes) very painful.
Maybe better to slightly change quota system to take into account history of 
host behavior: what % of invalids this host has? And base speed of quota 
recovery on this percent.
There was some idea of "trusted" hosts in regards not to do task replication if 
it is sent to "trusted" host.
Though I think it's bad idea, "trusted host" conception can be still used for 
quota-related calculations IMHO.



----- Original Message ----- 
From: [email protected] 
To: Josef W. Segur 
Cc: [email protected] ; [email protected] 
Sent: Friday, January 07, 2011 10:22 PM
Subject: Re: [boinc_dev] Quota system inefficiency, something better?


I believe that doubling is way too fast.  My thought is that when you
return a good one, you should get a replacement and your quota should go up
by one.  Yes, the recovery will be slower, but tomorrow you will be able to
fetch the count of successful ones that you returned today at the beginning
of the day.  If you are always on, you will still be able to keep your CPUs
busy.  If you are recovering from a temporary problem, either you can
nursemaid your computer through a couple of days or you can just accept a
slightly idle CPU for a while if you are not always attached to the
internet.

jm7


                                                                           
             "Josef W. Segur"                                              
             <jse...@westelcom                                             
             .com>                                                      To 
             Sent by:                  <[email protected]>        
             <boinc_dev-bounce                                          cc 
             [email protected]                                             
             u>                                                    Subject 
                                       Re: [boinc_dev] Quota system        
                                       inefficiency, something better?     
             01/07/2011 02:04                                              
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           




Although the quota system is not limiting those hosts enough,
they get frequent invalidations. The consecutive valid count is
usually zero, though I've seen a few cases of single digit
non-zero counts. So with s...@h gpu_multiplier set at 8, those hosts
are limited to not much more than 800 tasks per day per FERMI
GPU. My most recent check shows 19 hosts, though a few have two
Fermi cards so the GPU count is about 23. So the size of the
problem is reduced to under 2% of s...@h Enhanced tasks.

s...@h Enhanced tasks account for about 64% of the download
bandwidth, the much larger Astropulse v505 tasks get the
remainder. That makes the bandwidth impact on the order of 1%
currently.

Perhaps something like keeping a consecutive invalid count and
not doubling on reported "Success" when that count exceeds the
consecutive valid count would be better. But that would make
recovery from a temporary problem much slower.
--
                                                              Joe


On Fri, 07 Jan 2011 06:41:07 -0500, Richard Haselgrove wrote:

> OK, back-of-envelope error - that 10% overstates the size of the problem.
>
> But, more realistically, 20 hosts @ 2,000 tasks wasted each per day is
close to 4% of all multibeam tasks issued.
>
>   ----- Original Message -----
>   From: Richard Haselgrove
>
>
>   There is an updated version of that list at
http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1062822

>
>   Given the limited number of hosts involved, in the short term the
wastage (something of the order of 10% of SETI's limited download
bandwidth), maybe could/should be stemmed by invoking
http://boinc.berkeley.edu/trac/wiki/BlackList for the 19 hosts identified.
>
>   This approach has been used successfully by Milo at CPDN, although
there is a suggestion that something (perhaps the replacement of
max_results_day with per_app_version equivalents) broke the blacklist
facility after he started using it - quotas which had been set to -1
manually became positive again. The SETI situation would be a useful test
of this tool while a longer-term automatic solution is sought.
>
>     ----- Original Message -----
>     From: Raistmer
>
>
>     Hello.
>
>     Looks like current quota system implementation can't prevent project
resources waste in case of "partially" broken host.
>     For example, host with anonymous platform running FERMI incompatible
CUDA app on SETI.
>     It will produce incorrect overflows almost always but few specific
ARs that will be processed correctly and recive validation.
>     This small nomber of validations + "GPU" status of app (GPU has
greatly relaxed limits) allows continuous task trashing. Current quota
system implementation can't prevent massive task trashing in this
situation.
>
>     But now more historical info about host behavior stored on servers,
on per app version basis.
>     Maybe smth new can be implemented that will take into account not
only last successive validation but host history too?
>     The testcases are known, SETI community has list of such bad-behaving
hosts already:
http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1061788

>
>     The aim should be to reduce their throughput to 1 task per day for NV
GPU app until their owners reinstall GPU app.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.



_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Quota system inefficiency, something better?

Reply via email to