Re: [boinc_dev] 6.6.20 and work scheduling

Martin Tue, 28 Apr 2009 09:34:16 -0700

>> There appears to be a phenomenal amount of effort both in programming
>> and in scheduler CPU time in trying to meet exactly all deadlines down
>> to the last millisecond and for all eventualities.
> 
> At least not exceed the deadlines in as many cases as possible.


Agreed.

Also, no need to frantically check all scheduling calculations for every 
change of system state.

What does it matter if we work to a granularity of once per TSI period?

Do projects junk results that are 1 second beyond the deadline? (If so, 
then I'll bet that WU result transfer and validate time isn't allowed 
for...)

To follow KISS, the deadlines enforced by the project servers must be 
'soft' and allow for "deadline + 10 * client_default_TSI".


>> In other words, move to a KISS solution?
> 
> As long as it is not too simple.  Sometimes the obvious simple solution
> does not work.

A good simple solution is designed very cleverly to be inherently robust.


>> New rule:
>>
>> If we are going to accept that the project servers are going to be or
>> can be unreasonable, then the client must have the option to be equally
>> obnoxious (but completely honest) and reject WUs that are unworkable,
>> rather than attempting a critical futility (and failing days later).
>>
>> Add a bit of margin and then you can have only the TSI, and user
>> suspend/release as your scheduler trigger events. The work fetch and
>> send just independently keeps the WU cache filled but not overfilled.
>>
> Tasks come in discrete units, they do not come in convenient sized
> packages.  CPDN can run for months on end, and because of this, it was

That's fine. Big WUs have a long deadline.

The only 'problem' there is that a CPDN WU will block all other projects 
on a single CPU core system.

To overcome that, perhaps we need to change the cache semantics from 
that of having one absolute cache into WU times are accumulated, to the 
semantic of where the cache is proportionately divided amongst all the 
active projects for a host. The cache holds a minimum of work for each 
of the projects in proportion to the user set resource share for each 
project.

The TSI period then swaps the work as expected.


>> Immediately junk unstarted WUs for resend elsewhere if deadline trouble
>> ensues, rather than panic to try to scrape them in late.
>>
>> That will also implement a natural feedback loop for project admins to
[...]
> Servers treat aborted tasks as errors.  Your daily quota is reduced by one
> for each one of these.  This leads to the problem:
> 
> Request a second of work from project A.
> Run into deadline trouble.
> Abort the task.
> Since A is STILL the project with the highest LTD:  Request a second of
> work from project A.
> Run into deadline trouble.
> Abort the task.

And repeat to show an implementation bug...


This is where the *client must refuse to download the WU* in the first 
place.

That is, client requests work, server offers something, client refuses 
and goes into a backoff period before asking again.

Upon later requests, either the server will have something more 
reasonable to offer, or the client will be farther away from deadline 
problems.

(That will also save bandwidth over futilely downloading WUs and then 
junking them in any case.)


> We have already seen this in s...@h where there is a custom application that
> cherry picks CUDA tasks (rejecting the ones that are known to take a very
> long time on CUDA).  This has driven the daily quota of some people well
> below what their computer can actually do.  We do not want a repeat of that
> intentionally in the client.

That's where the source design must be fixed before it gets fixed for us 
by the users (participants) in other ways...


Allowing a large granularity in the scheduling eases a lot of the 
deadlines criticalities and special cases to be moot.

There also needs to be some form of hysteresis ("2 * TSI period"?) in 
moving between 'relaxed scheduling' and EDF 'panic'.

Regards,
Martin

-- 
--------------------
Martin Lomas
m_boincdev ml1 co uk.ddSPAM.dd
--------------------
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to