One more example when 2 overflows suppressed single non-overflowed (and much 
probably valid) result and passed through Validator into database.  These 
CUDA-induced overflows can pollute SETI's database.
http://setiathome.berkeley.edu/workunit.php?wuid=618018953

----- Original Message ----- 
From: "Raistmer" <[email protected]>
To: "Richard Haselgrove" <[email protected]>; 
<[email protected]>; "Richard Haselgrove" 
<[email protected]>
Sent: Thursday, June 03, 2010 4:24 PM
Subject: Re: [boinc_dev] host punishment mechanism revisited


> Good that reaction on computational error takes place, but what about more
> SETI-specific problem (for now, maybe it will appear on another project 
> too)
> with -9 overflows?
> Please, look at this WU
> http://setiathome.berkeley.edu/workunit.php?wuid=609263674 .
> Looks like "-9" took consensus. That is, these invalid GPU states can 
> plant
> invalid results into database that makes them even more dangerous...
>
> ----- Original Message ----- 
> From: "Richard Haselgrove" <[email protected]>
> To: <[email protected]>; "Richard Haselgrove"
> <[email protected]>
> Sent: Thursday, June 03, 2010 3:28 PM
> Subject: Re: [boinc_dev] host punishment mechanism revisited
>
>
> Another little wrinkle. Same host, quota was up to 141 at last show.
>
> 03/06/2010 12:23:33 s...@home Beta Test update requested by user
> 03/06/2010 12:23:35 s...@home Beta Test Sending scheduler request: 
> Requested
> by user.
> 03/06/2010 12:23:35 s...@home Beta Test Reporting 10 completed tasks,
> requesting new tasks for GPU
> 03/06/2010 12:23:38 s...@home Beta Test Scheduler request completed: got 0
> new tasks
> 03/06/2010 12:23:38 s...@home Beta Test Message from server: No work sent
> 03/06/2010 12:23:38 s...@home Beta Test Message from server: (reached 
> daily
> quota of 100 tasks)
>
> That batch of 10 reported takes happened to include a computation error 
> (the
> infamous "cudaAcc_find_triplets erroneously found a triplet twice in
> find_triplets_kernel" from NVidia's coding). So all my 'validation reward'
> so far today goes out of the window? Probably right, but harsh.
>
> --- On Thu, 3/6/10, Richard Haselgrove <[email protected]> 
> wrote:
>
>
> From: Richard Haselgrove <[email protected]>
> Subject: Re: [boinc_dev] host punishment mechanism revisited
> To: [email protected], "Richard Haselgrove"
> <[email protected]>
> Date: Thursday, 3 June, 2010, 12:19
>
>
>
>
>
>
>
> Some movement on this one off-list, too.
>
> Validations now produce a quota 'reward', as designed. For the moment, I'm
> still having to update manually, because the backoff until after midnight 
> is
> still happening (Changeset 21686 not active yet), but we're getting the
> idea.
>
> Two questions:
>
> 1) Is it right that an individual work request is allowed to 'overshoot'
> quota? Especially during error recovery, when quota is down to one per 
> day,
> I would expect that to be strictly enforced at least until a 'success'
> result can be reported. But looking at the running total I've added to 
> this
> list, the server sometimes gets way ahead of itself:
>
> 03/06/2010 08:28:32 s...@home Beta Test Reporting 71 completed tasks,
> requesting new tasks for GPU
> 03/06/2010 08:28:39 s...@home Beta Test Scheduler request completed: got 
> 46
> new tasks // 46
> 03/06/2010 08:28:55 s...@home Beta Test Scheduler request completed: got 
> 36
> new tasks // 82
> 03/06/2010 08:29:09 s...@home Beta Test Scheduler request completed: got 
> 20
> new tasks // 102
> 03/06/2010 08:29:25 s...@home Beta Test Scheduler request completed: got 
> 11
> new tasks // 113
> 03/06/2010 08:29:40 s...@home Beta Test Scheduler request completed: got 6
> new tasks // 119
> 03/06/2010 08:29:54 s...@home Beta Test Scheduler request completed: got 3
> new tasks // 122
> 03/06/2010 08:30:08 s...@home Beta Test Scheduler request completed: got 3
> new tasks // 125
> 03/06/2010 08:30:23 s...@home Beta Test Scheduler request completed: got 2
> new tasks // 127
> 03/06/2010 08:30:36 s...@home Beta Test Scheduler request completed: got 1
> new tasks // 128
> 03/06/2010 08:31:55 s...@home Beta Test Scheduler request completed: got 6
> new tasks // 135
> 03/06/2010 08:32:09 s...@home Beta Test Message from server: (reached 
> daily
> quota of 131 tasks)
>
> <request_delay>84750.000000</request_delay>
> <message priority="high">No work sent</message>
> <message priority="high">(reached daily quota of 131 tasks)
>
> 03-Jun-2010 09:31:24 [s...@home Beta Test] Sending scheduler request:
> Requested by user.
> 03/06/2010 09:31:24 s...@home Beta Test Reporting 19 completed tasks,
> requesting new tasks for GPU
> 03/06/2010 09:31:28 s...@home Beta Test Scheduler request completed: got 0
> new tasks
> 03/06/2010 09:31:28 s...@home Beta Test Message from server: No work sent
> 03/06/2010 09:31:28 s...@home Beta Test Message from server: (reached 
> daily
> quota of 132 tasks)
>
> 03-Jun-2010 09:32:39 [s...@home Beta Test] Sending scheduler request:
> Requested by user.
> 03/06/2010 09:32:43 s...@home Beta Test Scheduler request completed: got 
> 37
> new tasks // 172
>
> 03/06/2010 09:36:13 s...@home Beta Test Reporting 1 completed tasks,
> requesting new tasks for GPU
> 03/06/2010 09:36:16 s...@home Beta Test Message from server: (reached 
> daily
> quota of 140 tasks)
>
> 03/06/2010 11:53:48 s...@home Beta Test Reporting 44 completed tasks,
> requesting new tasks for GPU
> 03/06/2010 11:54:02 s...@home Beta Test Scheduler request completed: got 0
> new tasks
> 03/06/2010 11:54:02 s...@home Beta Test Message from server: No work sent
> 03/06/2010 11:54:02 s...@home Beta Test Message from server: (reached 
> daily
> quota of 141 tasks)
>
> 2) How are we going to handle this on the website host details? As I type,
> with a quota of 141,
> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=12316 is
> still saying "Maximum daily WU quota per CPU 100/day"
>
> Yet looking at a wingmate, Pappa's
> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=45842 
> (hi,
> Al) is showing "Maximum daily WU quota per CPU 0/day" - yet returning 
> valid
> work. That's not just the difference between logged-in and third-party
> reporting - other hosts I've checked are showing 100/day to third parties.
>
> A web display so far divorced from the new reality is clearly misleading,
> and shouldn't be shown. But it would be a shame to lose it completely: 
> often
> a volunteer's first question on a help-desk is "Why aren't I getting any
> work for Project X?", and seeing a crippled quota is a lead-in to advising
> on what to do about repeated computation errors.
>
>
> And while I'm reporting - SETI is aware that they're a download server
> short, aren't they?
>
> 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: About to connect()
> to boinc2.ssl.berkeley.edu port 80 (#0)
> 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: Trying
> 208.68.240.18...
> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Connection refused
> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Failed connect to
> boinc2.ssl.berkeley.edu:80; No error
> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Expire cleared
> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Closing connection
> #0
> 03-Jun-2010 09:41:23 [---] [http_debug] HTTP error: Couldn't connect to
> server
>
> --- On Wed, 2/6/10, Richard Haselgrove <[email protected]> 
> wrote:
>
>
> From: Richard Haselgrove <[email protected]>
> Subject: Re: [boinc_dev] host punishment mechanism revisited
> To: [email protected]
> Date: Wednesday, 2 June, 2010, 9:12
>
>
> I see that David has implemented the 'Reward for Validation' component of
> this discussion (http://boinc.berkeley.edu/trac/changeset/21675).
>
> However, don't we need to do something about backoffs?
>
> At the moment, if you ever reach the daily quota, you get a message saying
> typically "no work sent / reached daily quota of xxx tasks", and all
> scheduler RPCs are inhibited until 'server midnight + rnd(1 hour)'. I 
> assume
> that's a server backoff instruction, and not coded into the client (which
> wouldn't know the server's local time).
>
> But the daily quota is no longer a fixed value. Indeed, if you both 
> reported
> and requested work in the same RPC, your quota might be increased in the
> next few seconds, as the work you've just reported starts to validate. The
> backoff should be no more than the existing project RPC backoff and client
> 'no work sent' exponential backoff.
>
> Unfortunately, at the moment I can't test any of this: we only have one 
> test
> project with this code, and it says
>
> s...@home Beta Test 02/06/2010 08:28:40 Reporting 26 completed tasks, not
> requesting new tasks
> s...@home Beta Test 02/06/2010 08:28:45 Scheduler request failed: HTTP
> internal server error
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to