One more example when 2 overflows suppressed single non-overflowed (and much probably valid) result and passed through Validator into database. These CUDA-induced overflows can pollute SETI's database. http://setiathome.berkeley.edu/workunit.php?wuid=618018953
----- Original Message ----- From: "Raistmer" <[email protected]> To: "Richard Haselgrove" <[email protected]>; <[email protected]>; "Richard Haselgrove" <[email protected]> Sent: Thursday, June 03, 2010 4:24 PM Subject: Re: [boinc_dev] host punishment mechanism revisited > Good that reaction on computational error takes place, but what about more > SETI-specific problem (for now, maybe it will appear on another project > too) > with -9 overflows? > Please, look at this WU > http://setiathome.berkeley.edu/workunit.php?wuid=609263674 . > Looks like "-9" took consensus. That is, these invalid GPU states can > plant > invalid results into database that makes them even more dangerous... > > ----- Original Message ----- > From: "Richard Haselgrove" <[email protected]> > To: <[email protected]>; "Richard Haselgrove" > <[email protected]> > Sent: Thursday, June 03, 2010 3:28 PM > Subject: Re: [boinc_dev] host punishment mechanism revisited > > > Another little wrinkle. Same host, quota was up to 141 at last show. > > 03/06/2010 12:23:33 s...@home Beta Test update requested by user > 03/06/2010 12:23:35 s...@home Beta Test Sending scheduler request: > Requested > by user. > 03/06/2010 12:23:35 s...@home Beta Test Reporting 10 completed tasks, > requesting new tasks for GPU > 03/06/2010 12:23:38 s...@home Beta Test Scheduler request completed: got 0 > new tasks > 03/06/2010 12:23:38 s...@home Beta Test Message from server: No work sent > 03/06/2010 12:23:38 s...@home Beta Test Message from server: (reached > daily > quota of 100 tasks) > > That batch of 10 reported takes happened to include a computation error > (the > infamous "cudaAcc_find_triplets erroneously found a triplet twice in > find_triplets_kernel" from NVidia's coding). So all my 'validation reward' > so far today goes out of the window? Probably right, but harsh. > > --- On Thu, 3/6/10, Richard Haselgrove <[email protected]> > wrote: > > > From: Richard Haselgrove <[email protected]> > Subject: Re: [boinc_dev] host punishment mechanism revisited > To: [email protected], "Richard Haselgrove" > <[email protected]> > Date: Thursday, 3 June, 2010, 12:19 > > > > > > > > Some movement on this one off-list, too. > > Validations now produce a quota 'reward', as designed. For the moment, I'm > still having to update manually, because the backoff until after midnight > is > still happening (Changeset 21686 not active yet), but we're getting the > idea. > > Two questions: > > 1) Is it right that an individual work request is allowed to 'overshoot' > quota? Especially during error recovery, when quota is down to one per > day, > I would expect that to be strictly enforced at least until a 'success' > result can be reported. But looking at the running total I've added to > this > list, the server sometimes gets way ahead of itself: > > 03/06/2010 08:28:32 s...@home Beta Test Reporting 71 completed tasks, > requesting new tasks for GPU > 03/06/2010 08:28:39 s...@home Beta Test Scheduler request completed: got > 46 > new tasks // 46 > 03/06/2010 08:28:55 s...@home Beta Test Scheduler request completed: got > 36 > new tasks // 82 > 03/06/2010 08:29:09 s...@home Beta Test Scheduler request completed: got > 20 > new tasks // 102 > 03/06/2010 08:29:25 s...@home Beta Test Scheduler request completed: got > 11 > new tasks // 113 > 03/06/2010 08:29:40 s...@home Beta Test Scheduler request completed: got 6 > new tasks // 119 > 03/06/2010 08:29:54 s...@home Beta Test Scheduler request completed: got 3 > new tasks // 122 > 03/06/2010 08:30:08 s...@home Beta Test Scheduler request completed: got 3 > new tasks // 125 > 03/06/2010 08:30:23 s...@home Beta Test Scheduler request completed: got 2 > new tasks // 127 > 03/06/2010 08:30:36 s...@home Beta Test Scheduler request completed: got 1 > new tasks // 128 > 03/06/2010 08:31:55 s...@home Beta Test Scheduler request completed: got 6 > new tasks // 135 > 03/06/2010 08:32:09 s...@home Beta Test Message from server: (reached > daily > quota of 131 tasks) > > <request_delay>84750.000000</request_delay> > <message priority="high">No work sent</message> > <message priority="high">(reached daily quota of 131 tasks) > > 03-Jun-2010 09:31:24 [s...@home Beta Test] Sending scheduler request: > Requested by user. > 03/06/2010 09:31:24 s...@home Beta Test Reporting 19 completed tasks, > requesting new tasks for GPU > 03/06/2010 09:31:28 s...@home Beta Test Scheduler request completed: got 0 > new tasks > 03/06/2010 09:31:28 s...@home Beta Test Message from server: No work sent > 03/06/2010 09:31:28 s...@home Beta Test Message from server: (reached > daily > quota of 132 tasks) > > 03-Jun-2010 09:32:39 [s...@home Beta Test] Sending scheduler request: > Requested by user. > 03/06/2010 09:32:43 s...@home Beta Test Scheduler request completed: got > 37 > new tasks // 172 > > 03/06/2010 09:36:13 s...@home Beta Test Reporting 1 completed tasks, > requesting new tasks for GPU > 03/06/2010 09:36:16 s...@home Beta Test Message from server: (reached > daily > quota of 140 tasks) > > 03/06/2010 11:53:48 s...@home Beta Test Reporting 44 completed tasks, > requesting new tasks for GPU > 03/06/2010 11:54:02 s...@home Beta Test Scheduler request completed: got 0 > new tasks > 03/06/2010 11:54:02 s...@home Beta Test Message from server: No work sent > 03/06/2010 11:54:02 s...@home Beta Test Message from server: (reached > daily > quota of 141 tasks) > > 2) How are we going to handle this on the website host details? As I type, > with a quota of 141, > http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=12316 is > still saying "Maximum daily WU quota per CPU 100/day" > > Yet looking at a wingmate, Pappa's > http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=45842 > (hi, > Al) is showing "Maximum daily WU quota per CPU 0/day" - yet returning > valid > work. That's not just the difference between logged-in and third-party > reporting - other hosts I've checked are showing 100/day to third parties. > > A web display so far divorced from the new reality is clearly misleading, > and shouldn't be shown. But it would be a shame to lose it completely: > often > a volunteer's first question on a help-desk is "Why aren't I getting any > work for Project X?", and seeing a crippled quota is a lead-in to advising > on what to do about repeated computation errors. > > > And while I'm reporting - SETI is aware that they're a download server > short, aren't they? > > 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: About to connect() > to boinc2.ssl.berkeley.edu port 80 (#0) > 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: Trying > 208.68.240.18... > 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Connection refused > 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Failed connect to > boinc2.ssl.berkeley.edu:80; No error > 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Expire cleared > 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Closing connection > #0 > 03-Jun-2010 09:41:23 [---] [http_debug] HTTP error: Couldn't connect to > server > > --- On Wed, 2/6/10, Richard Haselgrove <[email protected]> > wrote: > > > From: Richard Haselgrove <[email protected]> > Subject: Re: [boinc_dev] host punishment mechanism revisited > To: [email protected] > Date: Wednesday, 2 June, 2010, 9:12 > > > I see that David has implemented the 'Reward for Validation' component of > this discussion (http://boinc.berkeley.edu/trac/changeset/21675). > > However, don't we need to do something about backoffs? > > At the moment, if you ever reach the daily quota, you get a message saying > typically "no work sent / reached daily quota of xxx tasks", and all > scheduler RPCs are inhibited until 'server midnight + rnd(1 hour)'. I > assume > that's a server backoff instruction, and not coded into the client (which > wouldn't know the server's local time). > > But the daily quota is no longer a fixed value. Indeed, if you both > reported > and requested work in the same RPC, your quota might be increased in the > next few seconds, as the work you've just reported starts to validate. The > backoff should be no more than the existing project RPC backoff and client > 'no work sent' exponential backoff. > > Unfortunately, at the moment I can't test any of this: we only have one > test > project with this code, and it says > > s...@home Beta Test 02/06/2010 08:28:40 Reporting 26 completed tasks, not > requesting new tasks > s...@home Beta Test 02/06/2010 08:28:45 Scheduler request failed: HTTP > internal server error > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
