Re: [boinc_dev] The reason for a local DCF.

dball Wed, 03 Apr 2013 16:10:31 -0700

We need some simple way for a project admin to be able to tell the server that 
the
estimate for a batch of jobs should be adjusted. The server should update the 
estimate
in the unsent jobs in that batch and somehow pass that info along to clients 
that
already have jobs in that batch the next time they contact the server.


One of the projects using the new version of the server (version 701 IIRC) that 
sends
"<dont_use_dcf/>" as part of the sched reply had the estimates suddenly drop by 
1/10th.
The project admin said he was changing the estimated time of WUs and there was 
one 0
missing for that batch of work units. These WU continued to be sent out for 
several days
before another batch started and the time estimates went back to reality.

David Ball

> I can't speak specifically for TrainWreck@home, but I think you'll find that 
> if it's
> running generic BOINC server code that's less than three years old (and if 
> it's telling
> the client to turn off DCF, I think it must be), then the project server does 
> *NOT*
> calculate its own DCF.
>
> I invite you to review what happened to DCF in sched_send.cpp (server code), 
> in
>
> http://boinc.berkeley.edu/trac/changeset/1d765245ed6ea666a46b2b5878371c4183accbeb/boinc-v2/sched/sched_send.cpp
>
>
>
>
>>________________________________
>> From: "McLeod, John" <[email protected]>
>>To: Richard Haselgrove <[email protected]>; 
>>"[email protected]"
>> <[email protected]>
>>Sent: Wednesday, 3 April 2013, 15:43
>>Subject: Re: [boinc_dev] The reason for a local DCF.
>>
>>Currently the server calculates its own DCF.  And when asked for 43200 
>>seconds of work
>> would inflate the fpops number to account for the difference.  This would 
>> mean that the
>> work that is received would have a sort of correct value for time before 
>> being inflated
>> again by the DCF calculation.
>>
>>No, this is a startup issue, but it can happen any time:
>>
>>1)       A new project is joined
>>
>>2)      A new application is pushed down
>>
>>3)      A new dataset that has a greatly different run time than expected is 
>>pushed
>> down.
>>
>>A possible way out:
>>If a project has "do not use DCF" set, modify the meaning of this somewhat.  
>>Instead of
>> ignoring the DCF entirely, add a DCF modifier to each task of a project 
>> which is 1/DCF
>> at time of acceptance of the task (this counteracts the fact that the DCF is 
>> calculated
>> twice, once at the server and once at the client).  Each time the DCF is 
>> used to
>> calculate the remaining time to run, multiply by this value.  When the DCF 
>> for the
>> project is recalculated, recalculate as normal ignoring this modifier.  This 
>> will
>> eventually have the DCF stabilize near 1, and allow the server to calculate 
>> what the
>> fpops ought and have the client responsive to massive miscalculations in 
>> initial state.
>>
>>From: Richard Haselgrove [mailto:[email protected]]
>>Sent: Wednesday, April 03, 2013 10:22 AM
>>To: McLeod, John; [email protected]
>>Subject: Re: [boinc_dev] The reason for a local DCF.
>>
>>Fully agreed. But remember that you have to follow the logic and also 
>>re-instate the
>> DCF code on that project's server.
>>
>>Say you set work fetch limits of 0.5 days minimum and 0.5 days additional - 
>>or a target
>> work buffer: 43200.00 + 43200.00 sec
>>
>>Once TrainWreck@home (eventually) becomes the highest priority project and 
>>your client
>> issues a work request, it will request 43200 seconds of work.
>>
>>The *server*, which currently ignores DCF in its calculations, will still use 
>>the 1hr
>> 17mn estimation - 4620 seconds. The server will assign 10 jobs to fill the 
>> request.
>>
>>Once those 10 jobs arrive at the client, they will be re-estimated by the 
>>client using
>> DCF, which by then will be about 19.27
>>
>>And your client will announce that it has received 10 days 7 hours of new 
>>work. And no
>> doubt panic.
>>
>>[I am assuming that TrainWreck@home's previous batch of work for this 
>>application was
>> correctly estimated, and that John has volunteered for TrainWreck@home for 
>> long enough
>> to have an established and stable APR for HitTheBuffers v1.01]
>>
>>________________________________
>>From: "McLeod, John" <[email protected]<mailto:[email protected]>>
>>To: "[email protected]<mailto:[email protected]>"
>> <[email protected]<mailto:[email protected]>>
>>Sent: Wednesday, 3 April 2013, 13:40
>>Subject: [boinc_dev] The reason for a local DCF.
>>
>>I am currently watching a train wreck that would not be happening if DCF was 
>>turned on
>> for a particular project.
>>
>>The initial estimate is a wall time of one hour seventeen minutes.  The 
>>actual wall
>> time is twenty four hour 44 minutes.  The problem is that work fetch and the 
>> scheduler
>> do not know that the problem exists for tasks #2 through #20 and are 
>> downloading work
>> from other projects, not realizing that the saturated time is 20 days and 
>> not 20 hours.
>>_______________________________________________
>>boinc_dev mailing list
>>[email protected]
>>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>To unsubscribe, visit the above URL and
>>(near bottom of page) enter your email address.
>>
>>
>>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] The reason for a local DCF.

Reply via email to