This implies same quota mechanism as used today but just with different numbers. For GPU idle time will be long enough to make recovering from single act of task trashing (it happens sometimes) very painful. Maybe better to slightly change quota system to take into account history of host behavior: what % of invalids this host has? And base speed of quota recovery on this percent. There was some idea of "trusted" hosts in regards not to do task replication if it is sent to "trusted" host. Though I think it's bad idea, "trusted host" conception can be still used for quota-related calculations IMHO.
----- Original Message ----- From: [email protected] To: Josef W. Segur Cc: [email protected] ; [email protected] Sent: Friday, January 07, 2011 10:22 PM Subject: Re: [boinc_dev] Quota system inefficiency, something better? I believe that doubling is way too fast. My thought is that when you return a good one, you should get a replacement and your quota should go up by one. Yes, the recovery will be slower, but tomorrow you will be able to fetch the count of successful ones that you returned today at the beginning of the day. If you are always on, you will still be able to keep your CPUs busy. If you are recovering from a temporary problem, either you can nursemaid your computer through a couple of days or you can just accept a slightly idle CPU for a while if you are not always attached to the internet. jm7 "Josef W. Segur" <jse...@westelcom .com> To Sent by: <[email protected]> <boinc_dev-bounce cc [email protected] u> Subject Re: [boinc_dev] Quota system inefficiency, something better? 01/07/2011 02:04 PM Although the quota system is not limiting those hosts enough, they get frequent invalidations. The consecutive valid count is usually zero, though I've seen a few cases of single digit non-zero counts. So with s...@h gpu_multiplier set at 8, those hosts are limited to not much more than 800 tasks per day per FERMI GPU. My most recent check shows 19 hosts, though a few have two Fermi cards so the GPU count is about 23. So the size of the problem is reduced to under 2% of s...@h Enhanced tasks. s...@h Enhanced tasks account for about 64% of the download bandwidth, the much larger Astropulse v505 tasks get the remainder. That makes the bandwidth impact on the order of 1% currently. Perhaps something like keeping a consecutive invalid count and not doubling on reported "Success" when that count exceeds the consecutive valid count would be better. But that would make recovery from a temporary problem much slower. -- Joe On Fri, 07 Jan 2011 06:41:07 -0500, Richard Haselgrove wrote: > OK, back-of-envelope error - that 10% overstates the size of the problem. > > But, more realistically, 20 hosts @ 2,000 tasks wasted each per day is close to 4% of all multibeam tasks issued. > > ----- Original Message ----- > From: Richard Haselgrove > > > There is an updated version of that list at http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1062822 > > Given the limited number of hosts involved, in the short term the wastage (something of the order of 10% of SETI's limited download bandwidth), maybe could/should be stemmed by invoking http://boinc.berkeley.edu/trac/wiki/BlackList for the 19 hosts identified. > > This approach has been used successfully by Milo at CPDN, although there is a suggestion that something (perhaps the replacement of max_results_day with per_app_version equivalents) broke the blacklist facility after he started using it - quotas which had been set to -1 manually became positive again. The SETI situation would be a useful test of this tool while a longer-term automatic solution is sought. > > ----- Original Message ----- > From: Raistmer > > > Hello. > > Looks like current quota system implementation can't prevent project resources waste in case of "partially" broken host. > For example, host with anonymous platform running FERMI incompatible CUDA app on SETI. > It will produce incorrect overflows almost always but few specific ARs that will be processed correctly and recive validation. > This small nomber of validations + "GPU" status of app (GPU has greatly relaxed limits) allows continuous task trashing. Current quota system implementation can't prevent massive task trashing in this situation. > > But now more historical info about host behavior stored on servers, on per app version basis. > Maybe smth new can be implemented that will take into account not only last successive validation but host history too? > The testcases are known, SETI community has list of such bad-behaving hosts already: http://setiathome.berkeley.edu/forum_thread.php?id=62573&nowrap=true#1061788 > > The aim should be to reduce their throughput to 1 task per day for NV GPU app until their owners reinstall GPU app. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
