On Apr 28, 2009, at 12:14 PM, David Anderson wrote: > At this point I'm interested in reports of bad scheduling > or work-fetch decisions, or bad debt calculation. > When these are all fixed we can address the efficiency > of the scheduling calculations (if it's an issue).
Is the point of the exercise to fix these problems? Or are you only looking for a band-aid to cover them up? There is a ton of bad things happening and they are exaggerated by the wider systems. To correctly address these issues a fundamentally new approach needs to be tried (IMO, for what that is worth). There is no single line of code to patch to make it all better, but a collision of the original design choices and the scaling up of system's capabilities. > bad scheduling Is subject to the following: 1) We do it too often (event driven) 2) All currently running tasks are eligible for preemption 3) TSI is not respected as a limiting factor 4) TSI is used in calculating deadline peril 5) Work mix is not kept "interesting" 6) Resource Share is used in calculating run time allocations 7) Work "batches" (tasks with roughly similar deadlines) are not "bank teller queued" 8) History of work scheduling is not preserved and all deadlines are calculated fresh each invocation. 9) True deadline peril is rare, but "false positives" are common 10) Some of the sources of work peril may be caused by a defective work fetch allocation 11) Other factors either obscured by the above, I forgot them, or maybe nothing else ... > work-fetch decisions Seems to be related to: 1) Bad debt calculations 2) Asking for inappropriate work loads 3) asking for inappropriate amounts 4) Design of client / server interactions > bad debt calculation Seems to be related to: 1) Assuming that all projects have CUDA work and asking for it 2) Assuming that a CUDA only project has CPU work and asking for it. 3) Not necessarily taking into account system width correctly 4) Not taking into account CUDA capability correctly > efficiency > of the scheduling calculations (if it's an issue) It is, but you and other nay-sayers don't have systems that experience the issues so, you and others denigrate or ignore the reports. The worse point is that to identify some of the problems requires logging, because we do, for example, resource scheduling so often the logs get so big they are not usable because of the sheer size because we are performing actions that ARE NOT NECESSARY ... because the assumption is that there is no cost. But, here is a cost right here. If we do resource scheduling 10 times more often than needed then there is 10 times more data to sift. Which is the main reason I have harped on SLOWING THIS DOWN. It is also why in my pseudo-code proposal I suggested that we do two things, one, make it switchable so that we can start with a bare bones "bank teller" style queuing system and only add refinements as we see where it does not work adequately. Let us not add more rules than needed. Start with the simplest rule set possible, run it, find exceptions, figure out why, fix those, move on ... SO, are you serious, or just trying to get us to go away? _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
