Re: [boinc_dev] Quota system inefficiency, something better?

John . McLeod Wed, 12 Jan 2011 10:53:03 -0800

   The trusted computers would not have any quotas applied at all.
   jm7
   -----<[email protected]> wrote: -----


     To: "Josef W. Segur" <[email protected]>, <[email protected]>
     From: Raistmer <[email protected]>
     Sent by: <[email protected]>
     Date: 01/07/2011 04:22PM
     cc: <[email protected]>, <[email protected]>
     Subject: Re: [boinc_dev] Quota system inefficiency, something better?
     This implies same quota mechanism as used today but just with different
     numbers.
     For GPU idle time will be long enough to make recovering from single act
     of task trashing (it happens sometimes) very painful.
     Maybe better to slightly change quota system to take into account history
     of host behavior: what % of invalids this host has? And base speed of
     quota recovery on this percent.
     There  was  some  idea of "trusted" hosts in regards not to do task
     replication if it is sent to "trusted" host.
     Though I think it's bad idea, "trusted host" conception can be still used
     for quota-related calculations IMHO.
     ----- Original Message -----
     From: [email protected]
     To: Josef W. Segur
     Cc: [email protected] ; [email protected]
     Sent: Friday, January 07, 2011 10:22 PM
     Subject: Re: [boinc_dev] Quota system inefficiency, something better?
     I believe that doubling is way too fast.  My thought is that when you
     return a good one, you should get a replacement and your quota should go
     up
     by one.  Yes, the recovery will be slower, but tomorrow you will be able
     to
     fetch  the  count of successful ones that you returned today at the
     beginning
     of the day.  If you are always on, you will still be able to keep your
     CPUs
     busy.  If you are recovering from a temporary problem, either you can
     nursemaid your computer through a couple of days or you can just accept a
     slightly idle CPU for a while if you are not always attached to the
     internet.
     jm7

                 "Josef W. Segur"
                 <jse...@westelcom
                 .com>                                                      To
                 Sent by:                  <[email protected]>
                 <boinc_dev-bounce                                          cc
                 [email protected]
                 u>                                                    Subject
                                           Re: [boinc_dev] Quota system
                                           inefficiency, something better?
                 01/07/2011 02:04
                 PM




     Although the quota system is not limiting those hosts enough,
     they get frequent invalidations. The consecutive valid count is
     usually zero, though I've seen a few cases of single digit
     non-zero counts. So with s...@h gpu_multiplier set at 8, those hosts
     are limited to not much more than 800 tasks per day per FERMI
     GPU. My most recent check shows 19 hosts, though a few have two
     Fermi cards so the GPU count is about 23. So the size of the
     problem is reduced to under 2% of s...@h Enhanced tasks.
     s...@h Enhanced tasks account for about 64% of the download
     bandwidth, the much larger Astropulse v505 tasks get the
     remainder. That makes the bandwidth impact on the order of 1%
     currently.
     Perhaps something like keeping a consecutive invalid count and
     not doubling on reported "Success" when that count exceeds the
     consecutive valid count would be better. But that would make
     recovery from a temporary problem much slower.
     --
                                                                  Joe
     On Fri, 07 Jan 2011 06:41:07 -0500, Richard Haselgrove wrote:
     >  OK, back-of-envelope error - that 10% overstates the size of the
     problem.
     >
     > But, more realistically, 20 hosts @ 2,000 tasks wasted each per day is
     close to 4% of all multibeam tasks issued.
     >
     >   ----- Original Message -----
     >   From: Richard Haselgrove
     >
     >
     >   There is an updated version of that list at
     [1]http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#10
     62822
     >
     >   Given the limited number of hosts involved, in the short term the
     wastage (something of the order of 10% of SETI's limited download
     bandwidth), maybe could/should be stemmed by invoking
     [2]http://boinc.berkeley.edu/trac/wiki/BlackListfor  the  19  hosts
     identified.
     >
     >   This approach has been used successfully by Milo at CPDN, although
     there is a suggestion that something (perhaps the replacement of
     max_results_day with per_app_version equivalents) broke the blacklist
     facility after he started using it - quotas which had been set to -1
     manually became positive again. The SETI situation would be a useful test
     of this tool while a longer-term automatic solution is sought.
     >
     >     ----- Original Message -----
     >     From: Raistmer
     >
     >
     >     Hello.
     >
     >     Looks like current quota system implementation can't prevent project
     resources waste in case of "partially" broken host.
     >     For example, host with anonymous platform running FERMI incompatible
     CUDA app on SETI.
     >     It will produce incorrect overflows almost always but few specific
     ARs that will be processed correctly and recive validation.
     >     This small nomber of validations + "GPU" status of app (GPU has
     greatly relaxed limits) allows continuous task trashing. Current quota
     system implementation can't prevent massive task trashing in this
     situation.
     >
     >     But now more historical info about host behavior stored on servers,
     on per app version basis.
     >     Maybe smth new can be implemented that will take into account not
     only last successive validation but host history too?
     >        The  testcases  are known, SETI community has list of such
     bad-behaving
     hosts already:
     [3]http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#10
     61788
     >
     >     The aim should be to reduce their throughput to 1 task per day for
     NV
     GPU app until their owners reinstall GPU app.
     _______________________________________________
     boinc_dev mailing list
     [email protected]
     [4]http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
     To unsubscribe, visit the above URL and
     (near bottom of page) enter your email address.
     _______________________________________________
     boinc_dev mailing list
     [email protected]
     [5]http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
     To unsubscribe, visit the above URL and
     (near bottom of page) enter your email address.
     _______________________________________________
     boinc_dev mailing list
     [email protected]
     [6]http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
     To unsubscribe, visit the above URL and
     (near bottom of page) enter your email address.

References

   1. 
http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1062822
   2. http://boinc.berkeley.edu/trac/wiki/BlackList
   3. 
http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1061788
   4. http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
   5. http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
   6. http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Quota system inefficiency, something better?

Reply via email to