Re: [boinc_dev] 6.6.20 and work scheduling

Jonathan Hoser Tue, 28 Apr 2009 10:41:04 -0700

I'm merging a few parts in the emails where we actually talk about the 
same stuff...


[email protected] wrote:
>
> CPU scheduling is not FIFO if there is more than one project.  It is FIFO
> within a project and Round Robin between projects.  There are some slight
> differences in the actual code that starts the task based on whether it is
> in memory or not, but there is no difference in the test to see whether it
> should be running, or needs to preempt now.  I haven't figured out why
> there should be.  The start or re-start of the task does not really have a
> bearing on the scheduling of the task.
>
>   
>>> You need a list of tasks that would ideally be running now.
>>>
>>>       
>> Nope, because this list is implicitly given by the fifo-ordering mixed
>> with 'queue-jumping' of soon-to-meet-the-deadline-jobs AND
>> the resource-share selector.
>>     
>
> Except that CPU tasks are run round robin between projects - to give
> variety.
Yes, and thus ignoring our project priority/resource sharing value.
That was one side-effect of my approach: to fix that 'by the way'.



>> And yes, estimated runtime changes all the time, but do we need to care
>> about that every second it reports back?
>> Why not simply look at it when it comes to (re)scheduling events driven
>> by the below stated points?
> Remaining time may not change by just a little bit.  An example that occurs
> sometimes:
>
> Project Z is added to the mix of projects on a computer.  The estimate for
> all tasks is 5 minutes, and so 50 are downloaded.  After the first task is
> run, it is noticed that the estimate is WAY off and the tasks actually take
> 5 hours (and these numbers pretty much describe what happened).  Now we
> really do have some serious deadline problems.  What used to look
> completely reasonable no longer looks as reasonable.
>
> The client cannot tell without testing whether something like this has
> happened or not.
>   
Well, if we really want to fix such a bug on the client side,
we might add a reordering step (moving by deadline) in the fifo-queue. 
But even then,
by my design, not all jobs may be completed before the deadline, because 
the deadline is not superior to anything else (in this case esp. 
resource share)

I don't want to throw deadlines over board entirely, don't get me wrong 
here - I'm most unhappy as a project admin when folks with a huge huge 
huge cache
keep the SIMAP workunits well longer than 97% of their brethren... thus 
actually hindering us in completing our monthly batch as fast as possible...

But in my opinion we needn't fix projects misbehaviors on the client 
side. Really not!
>   
>> 1. either we have time-slicing or we don't.
>> If we really got a job with a deadline so close that waiting till the
>> end of the current timeslice (with more cpu-cores a more regular event)
>> will really mean its ultimate failure, then there's something wrong that
>> we needn't fix, it shouldn't be the clients aim to fix shortcomings of
>> supplying
>> projects.
>>     
>
> We have time slicing if we can, on as many CPUs as we can.  Sometimes an
> immediate preempt is needed.
>
> An example:
>
> Task A from Project A never checkpoints.  It was started before Task B from
> Project B was downloaded.
>
> Task A has 48 hours remaining CPU time.
> Task B has 24 hours till deadline, but only 1 hour of run time.
>
> Sometime during the next 23 hours, Task A will have to be pre-empted NOT AT
> A CHECKPOINT in order to allow Task B to run before deadline.  Task A will,
> of course, have to remain in memory in order to make progress.
>
> Believe it or not, there are projects that supply tasks that run for a week
> without checkpointing.
>   
Allright, an example.
We will catch it however, if we add 'reached TSI' as an event in our 
poll-loop. And the users TSI is sufficient for this case.
If it isn't - I'd say, tough luck for the project.
And of course, always using the more and more pathological case of a 
single-CPU box.
>   
>> 2. yes it does have multiple cpus in mind. Or do you want to tell me,
>> that every app is asked at the same time (over multiple cores) to
>> checkpoint/
>> does checkpoints in complete synchronisation with all other running
>> apps? I think not.
>> Thus this event will likely be triggered more often than the others, but
>> will actually only do something if the timeslice /TSI of THAT app on
>> THAT core
>> is up.
>>     
>
> Checkpoints happen at random when the task is ready to checkpoint, not when
> the host is ready for it to checkpoint.  It will happen that more than one
> task will checkpoint in the same second (the polling interval), not every
> time of course, but it is going to happen.
>   
Yes, so what?
Our poll-loop catches 3 checkpoints reached:
Let's say - to keep things interessting, that all three have reached / 
are over our TSI:
1. Handle first task.
reschedule() operation, ultimately running the 'task complete' case;
either preempt that task and launch another or keep running it, because 
the resourceshare/etc. demands it
2. Handle the second task
reschedule() operation, ultimately running the 'task complete' case;
either preempt that task and launch another or keep running it, because 
the resourceshare/etc. demands it
3. Handle the third task:
reschedule() operation, ultimately running the 'task complete' case;
either preempt that task and launch another or keep running it, because 
the resourceshare/etc. demands it

All done. Not?
>>>> task complete:
>>>> if task.complete
>>>>    do rescheduling
>>>>
>>>> download complete:
>>>>    do scheduling
>>>>
>>>> project/task resume/suspend
>>>>    do rescheduling
>>>>
>>>> maybe (for sake of completeness):
>>>> RPC complete:
>>>> if server asks to drop WU
>>>>    halt job;
>>>>    do rescheduling (job)
>>>>
>>>> The trickier part of course are the
>>>> scheduling /rescheduling calls, and I'm currently leafing through my
>>>> notepad looking for the sketch...
>>>> for my idea we'd need
>>>>
>>>> list of jobs run (wct, project)
>>>> -> containing the wall-clock times for every job run during the last24
>>>>
>>>>         
>>> hours.
>>>
>>> How does the last 24 hours help?  Some tasks can run for days or weeks.
>>>
>>>       
>> Elongate it to a fitting period of time. 24h is an idea I picked up from
>> Paul 'to keep the mix interessting' - an idea I like.
>> So, if a task is running 24h 7days ... we needn't have running a second,
>> unless this is our ultimate high-priority project with a
>> priority/resource share of 1000:1 or so.
>>     
>
> I am sorry, but I am having no luck figuring out what the problem or the
> solution is with the above paragraph.
>   
This is the fix to the non-bug 'annoyance' of not-respecting 
resource-shares/project priorities.
And therefore a relatively short window over that we do our 
'resource-share-enforcement' should be sufficient.
>>> What about multiple reasons all at the same time?  Why do we need to know 
>>> the reason in the first place
>> Hm, some ordering could be chosen, I'll think about it; and the reason
>> does have its place: not all events do have the same event, do they?
>>     
>
> Currently events set a few flags.  In the cases that we are interested in,
> either a schedule of CPUs, or an enforcement of the schedule, or both.
> This discussion has focused on schedule and not enforcement.
>   
Hmhm. But correct me if I'm wrong, but my proposed scheme does not need 
enforcement, does it?
> BTW, the code is based on a polling loop, not event triggered call backs.
> Events are picked up by the polling loop on each iteration - and multiple
> events can be picked up at the same time because they all happened since
> the last time through the polling loop.
>   
Yes, that would create the need to order all 'events' to get all for a 
certain job.
Then we'd have to decide which event takes precedent over the others, 
and either handle them consecutively or discard them.
(E.g. why would we be interested in a checkpoint-reached + TSI met 
event, if the other event is 'task complete'?)

Best
-Jonathan
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to