You have the logic upside down. Your code leads to major thrashing of tasks.
Unless you check the global state, you run the risk of starting and
stopping a series of tasks IN THE SAME SECOND. because they all
checkpointed, and there is a perverse chain of what to run next.
Inspecting the global state is a much better solution than trying to figure
out exactly what to do on a particular event as there is the probability
that several events will happen during the same second.
No, the end of the loop is NOT the place to do the limit on how frequently
anything can happen. The interprocess communications also occurs in this
polling loop, and that has to happen much more frequently.
jm7
"Paul D. Buck"
<p.d.b...@comcast
.net> To
Sent by: BOINC Developers Mailing List
boinc_dev-bounces <[email protected]>
@ssl.berkeley.edu cc
"Josef W. Segur"
<[email protected]>,
04/28/2009 06:07 [email protected], Mark
PM Pottorff <[email protected]>
Subject
Re: [boinc_dev] 6.6.20 and work
scheduling
On Apr 28, 2009, at 12:14 PM, David Anderson wrote:
> At this point I'm interested in reports of bad scheduling
> or work-fetch decisions, or bad debt calculation.
> When these are all fixed we can address the efficiency
> of the scheduling calculations (if it's an issue).
Ok, I will append an RTF file with the pseudo-code below because the
mailing list seems to want to reformat everything and remove pretty
print.
It would be nice if we could start with talking to the high end logic
first and then quibble over the details of the minutia in the rules
later.
as suggestions to the code are made and agreed upon I will create
updates and circulate those if that is agreeable.
The reason I think this a more rational route is that I feel that the
current rules are so inadequate to the task on fast/wide systems that
a new approach is needed.
The basic logic of this should match to the textual description of
April 27, 2009 2:20:56 PM PDT where I list the problems and a broad
outline of the conceptual model of what I think needs to be done.
There ARE lots of details still missing. But the fundamentals are
there. the intent is to address the issues listed in that prior e-
mail, to wit:
1) We do it too often (event driven)
2) All currently running tasks are eligible for preemption
3) TSI is not respected as a limiting factor
4) TSI is used in calculating deadline peril
5) Work mix is not kept "interesting"
6) Resource Share is used in calculating run time allocations
7) Work "batches" (tasks with roughly similar deadlines) are not "bank
teller queued"
8) History of work scheduling is not preserved and all deadlines are
calculated fresh each invocation.
9) True deadline peril is rare, but "false positives" are common
10) Some of the sources of work peril may be caused by a defective
work fetch allocation
11) Other factors either obscured by the above, I forgot them, or
maybe nothing else ...
see the e-mail for the rest
(See attached file: rs-sched-pseudo-code.rtf)
Pseudo code:
If I understand the logic this is the BOINC client in a nutshell.
GLOBAL List
task-list
// list of all tasks, unknown order
task-queue
// tasks in ordered list, highest priority task first
projects-with-tasks-running // list of projects with
tasks running
projects-with-tasks-queued // list of projects with
tasks queued
task-completion-interval // the running
average of time between task
completions
task-checkpoint-interval // the running
average of time between task
checkpoints
rs-scoreboard
// scoreboard of time spent per project vs. time
// that should be spent on a project on a
per unit time basis
// shortfalls would increase the priority of
the tasks in the
task-queue
main ()
begin
initialize
loop forever
do what needs to be done
detect-events;
if events
if number of resources < 4
then
use old
scheduler
else
schedule-resources (event list)
do whatever else needs to be done
sleep; // isn't this where the 60
second limit should occur?
end loop; // forever
end // main
The new routines would look something like this:
schedule-resources (event list)
iterate through event list
// note first for speed
case checkpointing:
if task.runtime < TSI
do nothing
else
halt task
update
projects-with-tasks-running
update rs-scoreboard
do update-task-queue
do
schedule-task-to-resource
update
task-checkpoint-interval
case task complete:
update projects-with-tasks-running
update task-completion-interval
update rs-scoreboard
do schedule-task-to-resource
case download complete:
do update-task-queue // did we create
deadline peril?
case project/task resume/suspend
do update-task-queue // did we create
deadline peril?
case RPC complete:
if server asks to drop WU
if task not started
drop task
(this may logically belong in update-task-queue)
do
update-task-queue
else
do nothing
case deadline-peril:
// on a nominal system this case
should never occur
// because queues are reasonable and
speed / width
// allows deliberative scheduling
// this event is raised in
update-task-queue
do determine-task-to-preempt
preempt task // frees resource
update projects-with-tasks-running
do schedule-task-to-resource
end iterate // event list
end schedule-resources
update-task-queue
iterate through task-list
update information using best available estimates
and rs-scoreboard
// establishes the baselines on each task for
priority ranking later
// I would not use rr-sim for rs-scoreboarding
drop task-queue
iterate through task-list
if task is running mark task as queued
// no need to queue already running tasks
if task was preempted insert it into the task
queue and mark as queued
loop until all tasks queued
iterate through task-list
find the highest priority (deadline
order essentially) task
insert into task-queue
mark task as queued
end loop // all tasks queued
test head task for deadline peril
// test will calculate based on task-completion-interval and
task-
checkpoint-interval
// estimated task run time, safety margin, etc. My current bad
example of IBERCIVIS tasks
// would not raise DP events, but would merely increase the
task
priority
if deadline peril exists
raise deadline-peril event
end update-task-queue
schedule-task-to-resource
if deadline-peril
assign task-queue.head-task to available resource
update projects-with-tasks-running
else
// try to keep work mix "interesting"
compare projects-with-tasks-running to
projects-with-tasks-queued
assign highest priority task where
task.project not in projects-
with-tasks-running to the available resource
// this is the most likely route for
tasks with long deadlines will
be scheduled as
// their rs-scoreboard allocation of
time shows a shortfall
else
assign task-queue.head-task to
available resource
end schedule-task-to-resource
determine-task-to-preempt
iterate running tasks
find task with longest time to completion and
deadline
mark as preempted
end determine-task-to-preempt
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
rs-sched-pseudo-code.rtf
Description: RTF file
_______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
