Good that reaction on computational error takes place, but what about more 
SETI-specific problem (for now, maybe it will appear on another project too) 
with -9 overflows?
Please, look at this WU 
http://setiathome.berkeley.edu/workunit.php?wuid=609263674 .
Looks like "-9" took consensus. That is, these invalid GPU states can plant 
invalid results into database that makes them even more dangerous...

----- Original Message ----- 
From: "Richard Haselgrove" <[email protected]>
To: <[email protected]>; "Richard Haselgrove" 
<[email protected]>
Sent: Thursday, June 03, 2010 3:28 PM
Subject: Re: [boinc_dev] host punishment mechanism revisited


Another little wrinkle. Same host, quota was up to 141 at last show.

03/06/2010 12:23:33 s...@home Beta Test update requested by user
03/06/2010 12:23:35 s...@home Beta Test Sending scheduler request: Requested 
by user.
03/06/2010 12:23:35 s...@home Beta Test Reporting 10 completed tasks, 
requesting new tasks for GPU
03/06/2010 12:23:38 s...@home Beta Test Scheduler request completed: got 0 
new tasks
03/06/2010 12:23:38 s...@home Beta Test Message from server: No work sent
03/06/2010 12:23:38 s...@home Beta Test Message from server: (reached daily 
quota of 100 tasks)

That batch of 10 reported takes happened to include a computation error (the 
infamous "cudaAcc_find_triplets erroneously found a triplet twice in 
find_triplets_kernel" from NVidia's coding). So all my 'validation reward' 
so far today goes out of the window? Probably right, but harsh.

--- On Thu, 3/6/10, Richard Haselgrove <[email protected]> wrote:


From: Richard Haselgrove <[email protected]>
Subject: Re: [boinc_dev] host punishment mechanism revisited
To: [email protected], "Richard Haselgrove" 
<[email protected]>
Date: Thursday, 3 June, 2010, 12:19







Some movement on this one off-list, too.

Validations now produce a quota 'reward', as designed. For the moment, I'm 
still having to update manually, because the backoff until after midnight is 
still happening (Changeset 21686 not active yet), but we're getting the 
idea.

Two questions:

1) Is it right that an individual work request is allowed to 'overshoot' 
quota? Especially during error recovery, when quota is down to one per day, 
I would expect that to be strictly enforced at least until a 'success' 
result can be reported. But looking at the running total I've added to this 
list, the server sometimes gets way ahead of itself:

03/06/2010 08:28:32 s...@home Beta Test Reporting 71 completed tasks, 
requesting new tasks for GPU
03/06/2010 08:28:39 s...@home Beta Test Scheduler request completed: got 46 
new tasks // 46
03/06/2010 08:28:55 s...@home Beta Test Scheduler request completed: got 36 
new tasks // 82
03/06/2010 08:29:09 s...@home Beta Test Scheduler request completed: got 20 
new tasks // 102
03/06/2010 08:29:25 s...@home Beta Test Scheduler request completed: got 11 
new tasks // 113
03/06/2010 08:29:40 s...@home Beta Test Scheduler request completed: got 6 
new tasks // 119
03/06/2010 08:29:54 s...@home Beta Test Scheduler request completed: got 3 
new tasks // 122
03/06/2010 08:30:08 s...@home Beta Test Scheduler request completed: got 3 
new tasks // 125
03/06/2010 08:30:23 s...@home Beta Test Scheduler request completed: got 2 
new tasks // 127
03/06/2010 08:30:36 s...@home Beta Test Scheduler request completed: got 1 
new tasks // 128
03/06/2010 08:31:55 s...@home Beta Test Scheduler request completed: got 6 
new tasks // 135
03/06/2010 08:32:09 s...@home Beta Test Message from server: (reached daily 
quota of 131 tasks)

<request_delay>84750.000000</request_delay>
<message priority="high">No work sent</message>
<message priority="high">(reached daily quota of 131 tasks)

03-Jun-2010 09:31:24 [s...@home Beta Test] Sending scheduler request: 
Requested by user.
03/06/2010 09:31:24 s...@home Beta Test Reporting 19 completed tasks, 
requesting new tasks for GPU
03/06/2010 09:31:28 s...@home Beta Test Scheduler request completed: got 0 
new tasks
03/06/2010 09:31:28 s...@home Beta Test Message from server: No work sent
03/06/2010 09:31:28 s...@home Beta Test Message from server: (reached daily 
quota of 132 tasks)

03-Jun-2010 09:32:39 [s...@home Beta Test] Sending scheduler request: 
Requested by user.
03/06/2010 09:32:43 s...@home Beta Test Scheduler request completed: got 37 
new tasks // 172

03/06/2010 09:36:13 s...@home Beta Test Reporting 1 completed tasks, 
requesting new tasks for GPU
03/06/2010 09:36:16 s...@home Beta Test Message from server: (reached daily 
quota of 140 tasks)

03/06/2010 11:53:48 s...@home Beta Test Reporting 44 completed tasks, 
requesting new tasks for GPU
03/06/2010 11:54:02 s...@home Beta Test Scheduler request completed: got 0 
new tasks
03/06/2010 11:54:02 s...@home Beta Test Message from server: No work sent
03/06/2010 11:54:02 s...@home Beta Test Message from server: (reached daily 
quota of 141 tasks)

2) How are we going to handle this on the website host details? As I type, 
with a quota of 141, 
http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=12316 is 
still saying "Maximum daily WU quota per CPU 100/day"

Yet looking at a wingmate, Pappa's 
http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=45842 (hi, 
Al) is showing "Maximum daily WU quota per CPU 0/day" - yet returning valid 
work. That's not just the difference between logged-in and third-party 
reporting - other hosts I've checked are showing 100/day to third parties.

A web display so far divorced from the new reality is clearly misleading, 
and shouldn't be shown. But it would be a shame to lose it completely: often 
a volunteer's first question on a help-desk is "Why aren't I getting any 
work for Project X?", and seeing a crippled quota is a lead-in to advising 
on what to do about repeated computation errors.


And while I'm reporting - SETI is aware that they're a download server 
short, aren't they?

03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: About to connect() 
to boinc2.ssl.berkeley.edu port 80 (#0)
03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: Trying 
208.68.240.18...
03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Connection refused
03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Failed connect to 
boinc2.ssl.berkeley.edu:80; No error
03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Expire cleared
03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Closing connection 
#0
03-Jun-2010 09:41:23 [---] [http_debug] HTTP error: Couldn't connect to 
server

--- On Wed, 2/6/10, Richard Haselgrove <[email protected]> wrote:


From: Richard Haselgrove <[email protected]>
Subject: Re: [boinc_dev] host punishment mechanism revisited
To: [email protected]
Date: Wednesday, 2 June, 2010, 9:12


I see that David has implemented the 'Reward for Validation' component of 
this discussion (http://boinc.berkeley.edu/trac/changeset/21675).

However, don't we need to do something about backoffs?

At the moment, if you ever reach the daily quota, you get a message saying 
typically "no work sent / reached daily quota of xxx tasks", and all 
scheduler RPCs are inhibited until 'server midnight + rnd(1 hour)'. I assume 
that's a server backoff instruction, and not coded into the client (which 
wouldn't know the server's local time).

But the daily quota is no longer a fixed value. Indeed, if you both reported 
and requested work in the same RPC, your quota might be increased in the 
next few seconds, as the work you've just reported starts to validate. The 
backoff should be no more than the existing project RPC backoff and client 
'no work sent' exponential backoff.

Unfortunately, at the moment I can't test any of this: we only have one test 
project with this code, and it says

s...@home Beta Test 02/06/2010 08:28:40 Reporting 26 completed tasks, not 
requesting new tasks
s...@home Beta Test 02/06/2010 08:28:45 Scheduler request failed: HTTP 
internal server error
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to