Re: [boinc_dev] 6.6.20 and work scheduling

John . McLeod Wed, 29 Apr 2009 06:09:15 -0700

You have the logic upside down.

Your code leads to major thrashing of tasks.


Unless you check the global state, you run the risk of starting and
stopping a series of tasks IN THE SAME SECOND.  because they all
checkpointed, and there is a perverse chain of what to run next.
Inspecting the global state is a much better solution than trying to figure
out exactly what to do on a particular event as there is the probability
that several events will happen during the same second.

No, the end of the loop is NOT the place to do the limit on how frequently
anything can happen.  The interprocess communications also occurs in this
polling loop, and that has to happen much more frequently.

jm7


                                                                           
             "Paul D. Buck"                                                
             <p.d.b...@comcast                                             
             .net>                                                      To 
             Sent by:                  BOINC Developers Mailing List       
             boinc_dev-bounces         <[email protected]>        
             @ssl.berkeley.edu                                          cc 
                                       "Josef W. Segur"                    
                                       <[email protected]>,             
             04/28/2009 06:07          [email protected], Mark        
             PM                        Pottorff <[email protected]>        
                                                                   Subject 
                                       Re: [boinc_dev] 6.6.20 and work     
                                       scheduling                          
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           





On Apr 28, 2009, at 12:14 PM, David Anderson wrote:

> At this point I'm interested in reports of bad scheduling
> or work-fetch decisions, or bad debt calculation.
> When these are all fixed we can address the efficiency
> of the scheduling calculations (if it's an issue).

Ok, I will append an RTF file with the pseudo-code below because the
mailing list seems to want to reformat everything and remove pretty
print.

It would be nice if we could start with talking to the high end logic
first and then quibble over the details of the minutia in the rules
later.

as suggestions to the code are made and agreed upon I will create
updates and circulate those if that is agreeable.

The reason I think this a more rational route is that I feel that the
current rules are so inadequate to the task on fast/wide systems that
a new approach is needed.

The basic logic of this should match to the textual description of
April 27, 2009 2:20:56 PM PDT where I list the problems and a broad
outline of the conceptual model of what I think needs to be done.

There ARE lots of details still missing.  But the fundamentals are
there.  the intent is to address the issues listed in that prior e-
mail, to wit:

1) We do it too often (event driven)
2) All currently running tasks are eligible for preemption
3) TSI is not respected as a limiting factor
4) TSI is used in calculating deadline peril
5) Work mix is not kept "interesting"
6) Resource Share is used in calculating run time allocations
7) Work "batches" (tasks with roughly similar deadlines) are not "bank
teller queued"
8) History of work scheduling is not preserved and all deadlines are
calculated fresh each invocation.
9) True deadline peril is rare, but "false positives" are common
10) Some of the sources of work peril may be caused by a defective
work fetch allocation
11) Other factors either obscured by the above, I forgot them, or
maybe nothing else ...

see the e-mail for the rest


(See attached file: rs-sched-pseudo-code.rtf)


Pseudo code:


If I understand the logic this is the BOINC client in a nutshell.

GLOBAL List
             task-list
             // list of all tasks, unknown order
             task-queue
       // tasks in ordered list, highest priority task first
             projects-with-tasks-running         // list of projects with
tasks running
             projects-with-tasks-queued          // list of projects with
tasks queued
             task-completion-interval                        // the running
average of time between task
completions
             task-checkpoint-interval                        // the running
average of time between task
checkpoints
             rs-scoreboard
       // scoreboard of time spent per project vs. time

                               // that should be spent on a project on a
per unit time basis

                               // shortfalls would increase the priority of
the tasks in the
task-queue

main ()
             begin
                         initialize
                         loop forever
                                     do what needs to be done
                                     detect-events;
                                     if events
                                                 if number of resources < 4
then
                                                             use old
scheduler
                                                 else

schedule-resources (event list)
                                     do whatever else needs to be done
                                     sleep; // isn't this where the 60
second limit should occur?
                         end loop; // forever
             end // main

The new routines would look something like this:

schedule-resources (event list)
             iterate through event list

                         // note first for speed
                         case checkpointing:
                                     if task.runtime < TSI
                                                 do nothing
                                     else
                                                 halt task
                                                 update
projects-with-tasks-running
                                                 update rs-scoreboard
                                                 do update-task-queue
                                                 do
schedule-task-to-resource
                                                 update
task-checkpoint-interval

                         case task complete:
                                     update projects-with-tasks-running
                                     update task-completion-interval
                                     update rs-scoreboard
                                     do schedule-task-to-resource

                         case download complete:
                                     do update-task-queue // did we create
deadline peril?

                         case project/task resume/suspend
                                     do update-task-queue // did we create
deadline peril?

                         case RPC complete:
                                     if server asks to drop WU
                                                 if task not started
                                                             drop task
(this may logically belong in update-task-queue)
                                                             do
update-task-queue
                                                 else
                                                             do nothing

                         case deadline-peril:
                                     // on a nominal system this case
should never occur
                                     // because queues are reasonable and
speed / width
                                     // allows deliberative scheduling
                                     // this event is raised in
update-task-queue
                                     do determine-task-to-preempt
                                     preempt task  // frees resource
                                     update projects-with-tasks-running
                                     do schedule-task-to-resource

             end iterate // event list
end schedule-resources


update-task-queue

             iterate through task-list
                         update information using best available estimates
and rs-scoreboard
                         // establishes the baselines on each task for
priority ranking later
                         // I would not use rr-sim for rs-scoreboarding

             drop task-queue
             iterate through task-list
                         if task is running mark task as queued
                         // no need to queue already running tasks
                         if task was preempted insert it into the task
queue and mark as queued

             loop until all tasks queued
                         iterate through task-list
                                     find the highest priority (deadline
order essentially) task
                         insert into task-queue
                         mark task as queued
             end loop // all tasks queued

             test head task for deadline peril
             // test will calculate based on task-completion-interval and
task-
checkpoint-interval
             // estimated task run time, safety margin, etc. My current bad

example of IBERCIVIS tasks
             // would not raise DP events, but would merely increase the
task
priority
             if deadline peril exists
                         raise deadline-peril event

end update-task-queue


schedule-task-to-resource

             if deadline-peril
                         assign task-queue.head-task to available resource
                         update projects-with-tasks-running
             else
                         // try to keep work mix "interesting"
                         compare projects-with-tasks-running to
projects-with-tasks-queued
                                     assign highest priority task where
task.project not in projects-
with-tasks-running to the available resource
                                     // this is the most likely route for
tasks with long deadlines will
be scheduled as
                                     // their rs-scoreboard allocation of
time shows a shortfall
                         else
                                     assign task-queue.head-task to
available resource

end schedule-task-to-resource


determine-task-to-preempt

             iterate running tasks
                         find task with longest time to completion and
deadline
                         mark as preempted
end determine-task-to-preempt


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

rs-sched-pseudo-code.rtf
Description: RTF file

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to