I've been monitoring this host at intervals over the last 10 days. Until today, 'Application Info' consistently showed the 273 jobs downloaded on 4 June as being 'today'.
Since we last looked at the host, I've swapped out the Fermi card (now running - faster - in a Windows XP host), and replaced it with a 9800GTX+. Today, for the first time, I have allowed work fetch for that cc1.1 card. The task list for the host correctly shows one s...@home Enhanced v6.09 (cuda23) task in progress. Application Info, on the other hand, shows no tasks at all today. I'm wondering whether this has any bearing on problems at the Main SETI project before the weekend. For the first time, many users reported receiving "Message from server: (reached daily quota of 100 tasks)". And like my Beta host, it seemed that quota did not reset the following day on hosts which have reached the quota limit. (Hosts which request and receive fewer jobs than quota allows seem to get the daily reset, as before) Of course, SETI Main doesn't have the full Beta code (no Application Info page, and I saw no sign of additional quota being allowed after tasks validate). It does, however, have the allowed overshoot on the work fetch request which first takes the host beyond quota. > And my host has been contacting the scheduler reqularly, asking for work, > most recently 7 Jun 2010 13:52:08 UTC. > > The main reason it hasn't downloaded any work is that the Beta project has > run out of work to send. > > >> ?? Surely the host record is also updated each time one of its results is >> validated or invalidated. That would have to affect consecutive_valid at >> least, even if derived items for punishment or reward are not calculated >> then. >> -- >> Joe >> >> On 7 Jun 2010 at 20:27, David wrote: >> >>> The host info is only updated when the host contacts the scheduler. >>> -- David >>> >>> Richard Haselgrove wrote: >>> > There's definitely something wrong with the (daily) quota resetting >>> > mechanism - whether that's the fault of the new code, or SETI's Beta >>> > server, I'll leave to you. >>> > >>> > Host 12316 last downloaded a SETI Beta task at 4 Jun 2010 20:01:03 UTC >>> > >>> > Yet as I type this (7 June 2010 15:00 UTC), the application info still >>> > says >>> > >>> > Number of tasks completed 1183 >>> > Max tasks per day 218 >>> > Number of tasks today 273 >>> > Consecutive valid tasks 118 >>> > Average turnaround time 0.45 days >>> > >>> > >>> > ----- Original Message ----- From: "Richard Haselgrove" >>> > <[email protected]> >>> > To: "David Anderson" <[email protected]> >>> > Cc: <[email protected]> >>> > Sent: Friday, June 04, 2010 10:04 AM >>> > Subject: Re: [boinc_dev] host punishment mechanism revisited >>> > >>> > >>> > Morning report. The validations trickled in slowly overnight: >>> > >>> > 04/06/2010 03:51:18 s...@home Beta Test Message from server: (reached >>> > daily quota of 205 tasks) >>> > 04/06/2010 04:24:17 s...@home Beta Test Message from server: (reached >>> > daily quota of 206 tasks) >>> > 04/06/2010 06:19:43 s...@home Beta Test Scheduler request completed: >>> > got >>> > 34 new tasks >>> > 04/06/2010 06:19:59 s...@home Beta Test Message from server: (reached >>> > daily quota of 209 tasks) >>> > >>> > So that's a significant overshoot. >>> > >>> > Also, "today" seems to be lasting an awfully long time: surely this >>> > should have reset before 09:00 UTC? >>> > >>> > 0.60 days >>> > Number of tasks completed 786 >>> > Max tasks per day 213 >>> > Number of tasks today 241 >>> > Consecutive valid tasks 113 >>> > Average turnaround time 0.60 days >>> > >>> > If I happen to get another of those 'erroneous triplets' (which are a >>> > project error, not a host failure), the "punishment" from the thread >>> > title is going to be massive. >>> > >>> > --- On Fri, 4/6/10, Richard Haselgrove <[email protected]> >>> > wrote: >>> > >>> > >>> > From: Richard Haselgrove <[email protected]> >>> > Subject: Re: [boinc_dev] host punishment mechanism revisited >>> > To: "David Anderson" <[email protected]> >>> > Cc: [email protected] >>> > Date: Friday, 4 June, 2010, 1:44 >>> > >>> > >>> > No, it wasn't to be. >>> > >>> > Crept up slowly to >>> > >>> > Number of tasks completed 778 >>> > Max tasks per day 205 >>> > Number of tasks today 207 >>> > Consecutive valid tasks 105 >>> > Average turnaround time 0.62 days >>> > >>> > but I ran out of jobs just two short - last 18 with no wingmates at >>> > all. >>> > It can chew GPUGrid for a while and I'll try for quota overshoot again >>> > in the morning. >>> > >>> > >>> > --- On Thu, 3/6/10, Richard Haselgrove <[email protected]> >>> > wrote: >>> > >>> > >>> > From: Richard Haselgrove <[email protected]> >>> > Subject: Re: [boinc_dev] host punishment mechanism revisited >>> > To: "David Anderson" <[email protected]> >>> > Cc: [email protected] >>> > Date: Thursday, 3 June, 2010, 22:54 >>> > >>> > >>> > Yes, that'll be useful for debugging and troubleshooting - thanks. >>> > >>> > I see I'm currently still seven tasks over quota: let's hope I get >>> > some >>> > cooperative wingmates before bedtime, so I get the chance to do one >>> > more >>> > work fetch under controlled conditions. >>> > >>> > >>> > --- On Thu, 3/6/10, David Anderson <[email protected]> wrote: >>> > >>> > >>> > From: David Anderson <[email protected]> >>> > Subject: Re: [boinc_dev] host punishment mechanism revisited >>> > To: "Richard Haselgrove" <[email protected]> >>> > Cc: [email protected] >>> > Date: Thursday, 3 June, 2010, 21:31 >>> > >>> > >>> > I added a new web page showing app-version-level scheduling info: >>> > http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=12316 >>> > >>> > (linked to from "Application details" on the host page). >>> > >>> > This will make it somewhat easier to follow what's going on. >>> > >>> > In principle there should be no overshoot of the quota. >>> > There may be bugs, however. Please send the info before/after. >>> > >>> > -- David >>> > >>> > Richard Haselgrove wrote: >>> >> Some movement on this one off-list, too. >>> >> Validations now produce a quota 'reward', as designed. For the >>> >> moment, >>> >> I'm still having to update manually, because the backoff until after >>> >> midnight is still happening (Changeset 21686 not active yet), but >>> >> we're getting the idea. >>> >> Two questions: >>> >> 1) Is it right that an individual work request is allowed to >>> >> 'overshoot' quota? Especially during error recovery, when quota is >>> >> down to one per day, I would expect that to be strictly enforced at >>> >> least until a 'success' result can be reported. But looking at the >>> >> running total I've added to this list, the server sometimes gets way >>> >> ahead of itself: >>> >> 03/06/2010 08:28:32 s...@home Beta Test Reporting 71 completed tasks, >>> >> requesting new tasks for GPU >>> >> 03/06/2010 08:28:39 s...@home Beta Test Scheduler request completed: >>> >> got 46 new tasks // 46 >>> >> 03/06/2010 08:28:55 s...@home Beta Test Scheduler request completed: >>> >> got 36 new tasks // 82 >>> >> 03/06/2010 08:29:09 s...@home Beta Test Scheduler request completed: >>> >> got 20 new tasks // 102 >>> >> 03/06/2010 08:29:25 s...@home Beta Test Scheduler request completed: >>> >> got 11 new tasks // 113 >>> >> 03/06/2010 08:29:40 s...@home Beta Test Scheduler request completed: >>> >> got 6 new tasks // 119 >>> >> 03/06/2010 08:29:54 s...@home Beta Test Scheduler request completed: >>> >> got 3 new tasks // 122 >>> >> 03/06/2010 08:30:08 s...@home Beta Test Scheduler request completed: >>> >> got 3 new tasks // 125 >>> >> 03/06/2010 08:30:23 s...@home Beta Test Scheduler request completed: >>> >> got 2 new tasks // 127 >>> >> 03/06/2010 08:30:36 s...@home Beta Test Scheduler request completed: >>> >> got 1 new tasks // 128 >>> >> 03/06/2010 08:31:55 s...@home Beta Test Scheduler request completed: >>> >> got 6 new tasks // 135 >>> >> 03/06/2010 08:32:09 s...@home Beta Test Message from server: (reached >>> >> daily quota of 131 tasks) >>> >> >>> >> <request_delay>84750.000000</request_delay> >>> >> <message priority="high">No work sent</message> >>> >> <message priority="high">(reached daily quota of 131 tasks) >>> >> 03-Jun-2010 09:31:24 [s...@home Beta Test] Sending scheduler request: >>> >> Requested by user. >>> >> 03/06/2010 09:31:24 s...@home Beta Test Reporting 19 completed tasks, >>> >> requesting new tasks for GPU >>> >> 03/06/2010 09:31:28 s...@home Beta Test Scheduler request completed: >>> >> got 0 new tasks >>> >> 03/06/2010 09:31:28 s...@home Beta Test Message from server: No work >>> >> sent >>> >> 03/06/2010 09:31:28 s...@home Beta Test Message from server: (reached >>> >> daily quota of 132 tasks) >>> >> 03-Jun-2010 09:32:39 [s...@home Beta Test] Sending scheduler request: >>> >> Requested by user. >>> >> 03/06/2010 09:32:43 s...@home Beta Test Scheduler request completed: >>> >> got 37 new tasks // 172 >>> >> 03/06/2010 09:36:13 s...@home Beta Test Reporting 1 completed tasks, >>> >> requesting new tasks for GPU >>> >> 03/06/2010 09:36:16 s...@home Beta Test Message from server: (reached >>> >> daily quota of 140 tasks) >>> >> 03/06/2010 11:53:48 s...@home Beta Test Reporting 44 completed tasks, >>> >> requesting new tasks for GPU >>> >> 03/06/2010 11:54:02 s...@home Beta Test Scheduler request completed: >>> >> got 0 new tasks >>> >> 03/06/2010 11:54:02 s...@home Beta Test Message from server: No work >>> >> sent >>> >> 03/06/2010 11:54:02 s...@home Beta Test Message from server: (reached >>> >> daily quota of 141 tasks) >>> >> 2) How are we going to handle this on the website host details? As I >>> >> type, with a quota of 141, >>> >> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=12316 >>> >> is still saying "Maximum daily WU quota per CPU 100/day" >>> >> Yet looking at a wingmate, Pappa's >>> >> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=45842 >>> >> (hi, Al) is showing "Maximum daily WU quota per CPU 0/day" - yet >>> >> returning valid work. That's not just the difference between >>> >> logged-in >>> >> and third-party reporting - other hosts I've checked are showing >>> >> 100/day to third parties. >>> >> A web display so far divorced from the new reality is clearly >>> >> misleading, and shouldn't be shown. But it would be a shame to lose >>> >> it >>> >> completely: often a volunteer's first question on a help-desk is "Why >>> >> aren't I getting any work for Project X?", and seeing a crippled >>> >> quota >>> >> is a lead-in to advising on what to do about repeated computation >>> >> errors. >>> >> >>> >> And while I'm reporting - SETI is aware that they're a download >>> >> server >>> >> short, aren't they? >>> >> 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: About to >>> >> connect() to boinc2.ssl.berkeley.edu port 80 (#0) >>> >> 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: Trying >>> >> 208.68.240.18... 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] >>> >> Info: Connection refused >>> >> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Failed >>> >> connect >>> >> to boinc2.ssl.berkeley.edu:80; No error >>> >> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Expire >>> >> cleared >>> >> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Closing >>> >> connection #0 >>> >> 03-Jun-2010 09:41:23 [---] [http_debug] HTTP error: Couldn't connect >>> >> to server >>> >> >>> >> --- On Wed, 2/6/10, Richard Haselgrove <[email protected]> >>> >> wrote: >>> >> >>> >> >>> >> From: Richard Haselgrove <[email protected]> >>> >> Subject: Re: [boinc_dev] host punishment mechanism revisited >>> >> To: [email protected] >>> >> Date: Wednesday, 2 June, 2010, 9:12 >>> >> >>> >> >>> >> I see that David has implemented the 'Reward for Validation' >>> >> component >>> >> of this discussion (http://boinc.berkeley.edu/trac/changeset/21675). >>> >> >>> >> However, don't we need to do something about backoffs? >>> >> >>> >> At the moment, if you ever reach the daily quota, you get a message >>> >> saying typically "no work sent / reached daily quota of xxx tasks", >>> >> and all scheduler RPCs are inhibited until 'server midnight + rnd(1 >>> >> hour)'. I assume that's a server backoff instruction, and not coded >>> >> into the client (which wouldn't know the server's local time). >>> >> >>> >> But the daily quota is no longer a fixed value. Indeed, if you both >>> >> reported and requested work in the same RPC, your quota might be >>> >> increased in the next few seconds, as the work you've just reported >>> >> starts to validate. The backoff should be no more than the existing >>> >> project RPC backoff and client 'no work sent' exponential backoff. >>> >> >>> >> Unfortunately, at the moment I can't test any of this: we only have >>> >> one test project with this code, and it says >>> >> >>> >> s...@home Beta Test 02/06/2010 08:28:40 Reporting 26 completed tasks, >>> >> not requesting new tasks >>> >> s...@home Beta Test 02/06/2010 08:28:45 Scheduler request failed: >>> >> HTTP >>> >> internal server error >> -- >> Joe >> >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. >> > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
