On Apr 28, 2009, at 12:14 PM, David Anderson wrote:

> At this point I'm interested in reports of bad scheduling
> or work-fetch decisions, or bad debt calculation.
> When these are all fixed we can address the efficiency
> of the scheduling calculations (if it's an issue).


Is the point of the exercise to fix these problems?

Or are you only looking for a band-aid to cover them up?

There is a ton of bad things happening and they are exaggerated by the  
wider systems.  To correctly address these issues a fundamentally new  
approach needs to be tried (IMO, for what that is worth).  There is no  
single line of code to patch to make it all better, but a collision of  
the original design choices and the scaling up of system's capabilities.

> bad scheduling

Is subject to the following:

1) We do it too often (event driven)
2) All currently running tasks are eligible for preemption
3) TSI is not respected as a limiting factor
4) TSI is used in calculating deadline peril
5) Work mix is not kept "interesting"
6) Resource Share is used in calculating run time allocations
7) Work "batches" (tasks with roughly similar deadlines) are not "bank
teller queued"
8) History of work scheduling is not preserved and all deadlines are
calculated fresh each invocation.
9) True deadline peril is rare, but "false positives" are common
10) Some of the sources of work peril may be caused by a defective
work fetch allocation
11) Other factors either obscured by the above, I forgot them, or
maybe nothing else ...

> work-fetch decisions

Seems to be related to:

1) Bad debt calculations
2) Asking for inappropriate work loads
3) asking for inappropriate amounts
4) Design of client / server interactions

> bad debt calculation

Seems to be related to:

1) Assuming that all projects have CUDA work and asking for it
2) Assuming that a CUDA only project has CPU work and asking for it.
3) Not necessarily taking into account system width correctly
4) Not taking into account CUDA capability correctly

> efficiency
> of the scheduling calculations (if it's an issue)

It is, but you and other nay-sayers don't have systems that experience  
the issues so, you and others denigrate or ignore the reports.

The worse point is that to identify some of the problems requires  
logging, because we do, for example, resource scheduling so often the  
logs get so big they are not usable because of the sheer size because  
we are performing actions that ARE NOT NECESSARY ... because the  
assumption is that there is no cost.  But, here is a cost right here.   
If we do resource scheduling 10 times more often than needed then  
there is 10 times more data to sift.  Which is the main reason I have  
harped on SLOWING THIS DOWN.

It is also why in my pseudo-code proposal I suggested that we do two  
things, one, make it switchable so that we can start with a bare bones  
"bank teller" style queuing system and only add refinements as we see  
where it does not work adequately.  Let us not add more rules than  
needed.  Start with the simplest rule set possible, run it, find  
exceptions, figure out why, fix those, move on ...

SO, are you serious, or just trying to get us to go away?
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to