Paul:
The events that drive rr_sim:
RPC complete: We have now committed to some more work. Is there anything
that needs to start right now to complete on time.
File Download Complete: We now have a task ready to run. Does it need to
get started right now?
Task complete: What should we run now. We have a free processor.
Project and Task Suspend: Do we still have the same tasks running? Do we
have an empty resource?
Project and Task Resume: Some tasks have been suspended for an unknown
amount of time. Do we need to run one of them?
Please concentrate on the TEST for whether something needs to be run now or
not.
jm7
"Paul D. Buck"
<p.d.b...@comcast
.net> To
[email protected]
04/27/2009 11:40 cc
PM "Josef W. Segur"
<[email protected]>, BOINC dev
<[email protected]>,
Richard Haselgrove
<[email protected]>,
David Anderson
<[email protected]>, Rom
Walton <[email protected]>
Subject
Re: [boinc_dev] 6.6.20 and work
scheduling
On Apr 27, 2009, at 12:42 PM, [email protected] wrote:
We need to run the check for what should be run on a variety of
events and
at least once every task switch interval.
An argument you have made before, and not defended with any evidence, if I
may return the favor.
Your only statement is that it doesn't hurt anything to do the test. I
should put you in the situation of trying to prove the negative ... but I
won't. But, if it does not hurt anything to waste the time, then I can
also use that argument to fight for more accurate benchmarks run every
hour. Won't hurt to run them... and we will have more accurate numbers...
and they don't take that long to run ... heck, why not on the half hour?
even better ...
We need to do schedule enforcement on a different set of events, one
of
which is going to occur very frequently - checkpoint.
And all we need to do is to check to see if the application that has just
made a checkpoint is at TSI. There is no, repeat no need to do anything
else. Yet we do the full monte.
I just spent about 6 hours running a test on 5 systems and fiddling with
the logs:
Xeon 32, Dual Xeons with HT for 4 CPUs, running XP Pro 32-Bit, around 50
projects, 1 CUDA (9800GT), about 3-4,000 CS per day w/o CUDA (if memory
serves)
Xeon 64, Dual Xeons with HT for 4 CPUs, running XP Pro 64-Bit, ABC and
YoYo, 1 ATI GPU on MW, about 3.5-4,500 CS per day w/o CUDA (if memory
serves)
Mac Pro, OS-X, 8 Cores, Xeons, about 25 projects, no GPU
Q9300, XP Pro 32-bit, 4 cores, standard 50 projects, 1 CUDA (GTX280)
i7 940, 8 CPUs, XP Pro 32-bit, standard 50 projects, 4 CUDA (2 GTX 295
cards) about twice as fast as the Q9300 ~10-12,000 CS w/o CUDA (if memory
serves)
I ran a log on these systems for 3 hours, I was away and they all used the
same cc_config file. I don't feel that this is as typical as my normal
because of other issues I have raised (and seemingly ignored there also);
but at any rate I got these results:
Xeon 32 Xeon 64 Mac Pro Q9300 i7
enforce_schedule(): start 241 387 272 262 407
Request CPU reschedule: application 3 11 14 22 19
exited
Request CPU reschedule: Core client 1 1 1 1 1
configuration
Request CPU reschedule: files downloaded 1 9 15 26 21
Request CPU reschedule: 3 11 14 22 19
handle_finished_apps
Request CPU reschedule: Idle state 2 2 2 2 6
change
Request CPU reschedule: Scheduling 0 4 0 0
period elapsed
Request enforce CPU schedule: Checkpoint 86 379 129 92 302
reached
Request enforce CPU schedule: 7 17 25 47 44
schedule_cpus
schedule_cpus(): start 7 17 25 48 44
So, you tell me, how does juggling the schedule this often for these many
reasons make sense?
I care less about an excuse we have to check after checkpoint... why? My
TSI is 12 hours... 99% of the tasks should run to completion before hitting
TSI.
I also don't buy the after download argument. If Work Fetch is working
correctly I should not be fetching more work than can be handled. If we
are, obsessing in the Resource Scheduler is not the way to approach the
situation of a broken work fetch algorithm.
And, I point out again, had I been running SaH I would have completed about
15 more tasks on the Xeon 32, 18-20 on the Q9300, and 72 on the i7 ... also
were MW still issuing reliable work I was doing their tasks in about 10
seconds per (60K CS per day that machine alone, Xeon64)
The super deadline project does not exist, it died because it had
unrealistic expectations. So why do you keep using it as an excuse to
defend the indefensible? I really don't get it.
Under what scenario do we really, really need to reschedule every resource
in a system 2 times a minute, or even more often than that...
When a resource comes free at task completion or at expire of TSI, I can
buy rescheduling the one resource... but not the whole system ... even
windows does not do that ...
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.