I believe that some client tweaks can mostly, if not entirely, fix the
problem.  The project in question is, I believe using the wrapper, and the
wrapped executable does not do its own checkpoints.  This makes it
extremely difficult for them to supply checkpoints.

What does not get fixed with the following (what case am I missing?):

1)  Track the active wall time between checkpoints.  (Time spent actually
running, not time spend swapped out).
2)  Subtract the longest wall time between checkpoints for any project (or
preferably app version) that has work assigned to the host that has yet to
complete, from the computation deadline.  Do not subtract the wall time
since the last check point unless there are no other tasks from this app
version / project.
3)  Add the shortest wall time between checkpoints for any actively running
tasks to the estimated start delay as reported to the server.  Slightly
better would be to subtract the wall time since the last checkpoint while
determining this.

Addendum:  For all new projects, assume some fairly long time between
checkpoints - until noted otherwise.  Say 2 days.  The default would be set
to the actual value for this app version / project as soon as the first
checkpoint was reached..

For all new app versions defer to the project value.

Item #1 is required for both #2 and #3 to work.

Item #2 fixes the client so that tasks that are already on the host will
start in EDF earlier if needed based on a long duration between checkpoints
for any task on the system.

Item #3 fixes work requests so that tasks from very short deadline projects
will not be sent to a system that is busy with all tasks with long
durations between checkpoints.

jm7


                                                                           
             David Anderson                                                
             <[email protected]                                             
             ey.edu>                                                    To 
             Sent by:                  [email protected]              
             <boinc_alpha-boun                                          cc 
             [email protected].         BOINC Alpha list                    
             edu>                      <[email protected]>      
                                                                   Subject 
                                       Re: [boinc_alpha] 6.10.35 failure   
             03/03/2010 06:04          to start a task on time to          
             PM                        meetdeadline, and no start of task  
                                       even after deadline.                
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Dealing effectively with long jobs that don't checkpoint
will require more than just some client tweaks.
Maybe we'll take this on at some point, but not now.
For now, let's encourage projects not to supply such jobs.
-- David

[email protected] wrote:
> OK, this is now happening on a third computer (out of 11).  That is
almost
> 30% of my ocmputers are exhibiting the behavior of returning work late
> because we are neither accounting for time between benchmarks, nor are we
> preempting when needed in order to meet other deadlines.

_______________________________________________
boinc_alpha mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.



_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to