?? Surely the host record is also updated each time one of its results is
validated or invalidated. That would have to affect consecutive_valid at
least, even if derived items for punishment or reward are not calculated
then.
--
Joe
On 7 Jun 2010 at 20:27, David wrote:
> The host info is only updated when the host contacts the scheduler.
> -- David
>
> Richard Haselgrove wrote:
> > There's definitely something wrong with the (daily) quota resetting
> > mechanism - whether that's the fault of the new code, or SETI's Beta
> > server, I'll leave to you.
> >
> > Host 12316 last downloaded a SETI Beta task at 4 Jun 2010 20:01:03 UTC
> >
> > Yet as I type this (7 June 2010 15:00 UTC), the application info still says
> >
> > Number of tasks completed 1183
> > Max tasks per day 218
> > Number of tasks today 273
> > Consecutive valid tasks 118
> > Average turnaround time 0.45 days
> >
> >
> > ----- Original Message ----- From: "Richard Haselgrove"
> > <[email protected]>
> > To: "David Anderson" <[email protected]>
> > Cc: <[email protected]>
> > Sent: Friday, June 04, 2010 10:04 AM
> > Subject: Re: [boinc_dev] host punishment mechanism revisited
> >
> >
> > Morning report. The validations trickled in slowly overnight:
> >
> > 04/06/2010 03:51:18 s...@home Beta Test Message from server: (reached
> > daily quota of 205 tasks)
> > 04/06/2010 04:24:17 s...@home Beta Test Message from server: (reached
> > daily quota of 206 tasks)
> > 04/06/2010 06:19:43 s...@home Beta Test Scheduler request completed: got
> > 34 new tasks
> > 04/06/2010 06:19:59 s...@home Beta Test Message from server: (reached
> > daily quota of 209 tasks)
> >
> > So that's a significant overshoot.
> >
> > Also, "today" seems to be lasting an awfully long time: surely this
> > should have reset before 09:00 UTC?
> >
> > 0.60 days
> > Number of tasks completed 786
> > Max tasks per day 213
> > Number of tasks today 241
> > Consecutive valid tasks 113
> > Average turnaround time 0.60 days
> >
> > If I happen to get another of those 'erroneous triplets' (which are a
> > project error, not a host failure), the "punishment" from the thread
> > title is going to be massive.
> >
> > --- On Fri, 4/6/10, Richard Haselgrove <[email protected]>
> > wrote:
> >
> >
> > From: Richard Haselgrove <[email protected]>
> > Subject: Re: [boinc_dev] host punishment mechanism revisited
> > To: "David Anderson" <[email protected]>
> > Cc: [email protected]
> > Date: Friday, 4 June, 2010, 1:44
> >
> >
> > No, it wasn't to be.
> >
> > Crept up slowly to
> >
> > Number of tasks completed 778
> > Max tasks per day 205
> > Number of tasks today 207
> > Consecutive valid tasks 105
> > Average turnaround time 0.62 days
> >
> > but I ran out of jobs just two short - last 18 with no wingmates at all.
> > It can chew GPUGrid for a while and I'll try for quota overshoot again
> > in the morning.
> >
> >
> > --- On Thu, 3/6/10, Richard Haselgrove <[email protected]>
> > wrote:
> >
> >
> > From: Richard Haselgrove <[email protected]>
> > Subject: Re: [boinc_dev] host punishment mechanism revisited
> > To: "David Anderson" <[email protected]>
> > Cc: [email protected]
> > Date: Thursday, 3 June, 2010, 22:54
> >
> >
> > Yes, that'll be useful for debugging and troubleshooting - thanks.
> >
> > I see I'm currently still seven tasks over quota: let's hope I get some
> > cooperative wingmates before bedtime, so I get the chance to do one more
> > work fetch under controlled conditions.
> >
> >
> > --- On Thu, 3/6/10, David Anderson <[email protected]> wrote:
> >
> >
> > From: David Anderson <[email protected]>
> > Subject: Re: [boinc_dev] host punishment mechanism revisited
> > To: "Richard Haselgrove" <[email protected]>
> > Cc: [email protected]
> > Date: Thursday, 3 June, 2010, 21:31
> >
> >
> > I added a new web page showing app-version-level scheduling info:
> > http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=12316
> >
> > (linked to from "Application details" on the host page).
> >
> > This will make it somewhat easier to follow what's going on.
> >
> > In principle there should be no overshoot of the quota.
> > There may be bugs, however. Please send the info before/after.
> >
> > -- David
> >
> > Richard Haselgrove wrote:
> >> Some movement on this one off-list, too.
> >> Validations now produce a quota 'reward', as designed. For the moment,
> >> I'm still having to update manually, because the backoff until after
> >> midnight is still happening (Changeset 21686 not active yet), but
> >> we're getting the idea.
> >> Two questions:
> >> 1) Is it right that an individual work request is allowed to
> >> 'overshoot' quota? Especially during error recovery, when quota is
> >> down to one per day, I would expect that to be strictly enforced at
> >> least until a 'success' result can be reported. But looking at the
> >> running total I've added to this list, the server sometimes gets way
> >> ahead of itself:
> >> 03/06/2010 08:28:32 s...@home Beta Test Reporting 71 completed tasks,
> >> requesting new tasks for GPU
> >> 03/06/2010 08:28:39 s...@home Beta Test Scheduler request completed:
> >> got 46 new tasks // 46
> >> 03/06/2010 08:28:55 s...@home Beta Test Scheduler request completed:
> >> got 36 new tasks // 82
> >> 03/06/2010 08:29:09 s...@home Beta Test Scheduler request completed:
> >> got 20 new tasks // 102
> >> 03/06/2010 08:29:25 s...@home Beta Test Scheduler request completed:
> >> got 11 new tasks // 113
> >> 03/06/2010 08:29:40 s...@home Beta Test Scheduler request completed:
> >> got 6 new tasks // 119
> >> 03/06/2010 08:29:54 s...@home Beta Test Scheduler request completed:
> >> got 3 new tasks // 122
> >> 03/06/2010 08:30:08 s...@home Beta Test Scheduler request completed:
> >> got 3 new tasks // 125
> >> 03/06/2010 08:30:23 s...@home Beta Test Scheduler request completed:
> >> got 2 new tasks // 127
> >> 03/06/2010 08:30:36 s...@home Beta Test Scheduler request completed:
> >> got 1 new tasks // 128
> >> 03/06/2010 08:31:55 s...@home Beta Test Scheduler request completed:
> >> got 6 new tasks // 135
> >> 03/06/2010 08:32:09 s...@home Beta Test Message from server: (reached
> >> daily quota of 131 tasks)
> >>
> >> <request_delay>84750.000000</request_delay>
> >> <message priority="high">No work sent</message>
> >> <message priority="high">(reached daily quota of 131 tasks)
> >> 03-Jun-2010 09:31:24 [s...@home Beta Test] Sending scheduler request:
> >> Requested by user.
> >> 03/06/2010 09:31:24 s...@home Beta Test Reporting 19 completed tasks,
> >> requesting new tasks for GPU
> >> 03/06/2010 09:31:28 s...@home Beta Test Scheduler request completed:
> >> got 0 new tasks
> >> 03/06/2010 09:31:28 s...@home Beta Test Message from server: No work sent
> >> 03/06/2010 09:31:28 s...@home Beta Test Message from server: (reached
> >> daily quota of 132 tasks)
> >> 03-Jun-2010 09:32:39 [s...@home Beta Test] Sending scheduler request:
> >> Requested by user.
> >> 03/06/2010 09:32:43 s...@home Beta Test Scheduler request completed:
> >> got 37 new tasks // 172
> >> 03/06/2010 09:36:13 s...@home Beta Test Reporting 1 completed tasks,
> >> requesting new tasks for GPU
> >> 03/06/2010 09:36:16 s...@home Beta Test Message from server: (reached
> >> daily quota of 140 tasks)
> >> 03/06/2010 11:53:48 s...@home Beta Test Reporting 44 completed tasks,
> >> requesting new tasks for GPU
> >> 03/06/2010 11:54:02 s...@home Beta Test Scheduler request completed:
> >> got 0 new tasks
> >> 03/06/2010 11:54:02 s...@home Beta Test Message from server: No work sent
> >> 03/06/2010 11:54:02 s...@home Beta Test Message from server: (reached
> >> daily quota of 141 tasks)
> >> 2) How are we going to handle this on the website host details? As I
> >> type, with a quota of 141,
> >> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=12316
> >> is still saying "Maximum daily WU quota per CPU 100/day"
> >> Yet looking at a wingmate, Pappa's
> >> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=45842
> >> (hi, Al) is showing "Maximum daily WU quota per CPU 0/day" - yet
> >> returning valid work. That's not just the difference between logged-in
> >> and third-party reporting - other hosts I've checked are showing
> >> 100/day to third parties.
> >> A web display so far divorced from the new reality is clearly
> >> misleading, and shouldn't be shown. But it would be a shame to lose it
> >> completely: often a volunteer's first question on a help-desk is "Why
> >> aren't I getting any work for Project X?", and seeing a crippled quota
> >> is a lead-in to advising on what to do about repeated computation errors.
> >>
> >> And while I'm reporting - SETI is aware that they're a download server
> >> short, aren't they?
> >> 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: About to
> >> connect() to boinc2.ssl.berkeley.edu port 80 (#0)
> >> 03-Jun-2010 09:41:21 [---] [http_debug] [ID#1439] Info: Trying
> >> 208.68.240.18... 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439]
> >> Info: Connection refused
> >> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Failed connect
> >> to boinc2.ssl.berkeley.edu:80; No error
> >> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Expire cleared
> >> 03-Jun-2010 09:41:23 [---] [http_debug] [ID#1439] Info: Closing
> >> connection #0
> >> 03-Jun-2010 09:41:23 [---] [http_debug] HTTP error: Couldn't connect
> >> to server
> >>
> >> --- On Wed, 2/6/10, Richard Haselgrove <[email protected]>
> >> wrote:
> >>
> >>
> >> From: Richard Haselgrove <[email protected]>
> >> Subject: Re: [boinc_dev] host punishment mechanism revisited
> >> To: [email protected]
> >> Date: Wednesday, 2 June, 2010, 9:12
> >>
> >>
> >> I see that David has implemented the 'Reward for Validation' component
> >> of this discussion (http://boinc.berkeley.edu/trac/changeset/21675).
> >>
> >> However, don't we need to do something about backoffs?
> >>
> >> At the moment, if you ever reach the daily quota, you get a message
> >> saying typically "no work sent / reached daily quota of xxx tasks",
> >> and all scheduler RPCs are inhibited until 'server midnight + rnd(1
> >> hour)'. I assume that's a server backoff instruction, and not coded
> >> into the client (which wouldn't know the server's local time).
> >>
> >> But the daily quota is no longer a fixed value. Indeed, if you both
> >> reported and requested work in the same RPC, your quota might be
> >> increased in the next few seconds, as the work you've just reported
> >> starts to validate. The backoff should be no more than the existing
> >> project RPC backoff and client 'no work sent' exponential backoff.
> >>
> >> Unfortunately, at the moment I can't test any of this: we only have
> >> one test project with this code, and it says
> >>
> >> s...@home Beta Test 02/06/2010 08:28:40 Reporting 26 completed tasks,
> >> not requesting new tasks
> >> s...@home Beta Test 02/06/2010 08:28:45 Scheduler request failed: HTTP
> >> internal server error
--
Joe
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.