On Apr 27, 2009, at 12:42 PM, [email protected] wrote:

> We need to run the check for what should be run on a variety of  
> events and
> at least once every task switch interval.

An argument you have made before, and not defended with any evidence,  
if I may return the favor.

Your only statement is that it doesn't hurt anything to do the test.   
I should put you in the situation of trying to prove the negative ...  
but I won't.  But, if it does not hurt anything to waste the time,  
then I can also use that argument to fight for more accurate  
benchmarks run every hour.  Won't hurt to run them... and we will have  
more accurate numbers... and they don't take that long to run ...  
heck, why not on the half hour?  even better ...


> We need to do schedule enforcement on a different set of events, one  
> of
> which is going to occur very frequently - checkpoint.

And all we need to do is to check to see if the application that has  
just made a checkpoint is at TSI.  There is no, repeat no need to do  
anything else.  Yet we do the full monte.

I just spent about 6 hours running a test on 5 systems and fiddling  
with the logs:

Xeon 32, Dual Xeons with HT for 4 CPUs, running XP Pro 32-Bit, around  
50 projects, 1 CUDA (9800GT), about 3-4,000 CS per day w/o CUDA (if  
memory serves)
Xeon 64, Dual Xeons with HT for 4 CPUs, running XP Pro 64-Bit, ABC and  
YoYo, 1 ATI GPU on MW, about 3.5-4,500 CS per day w/o CUDA (if memory  
serves)
Mac Pro, OS-X, 8 Cores, Xeons, about 25 projects, no GPU
Q9300, XP Pro 32-bit, 4 cores, standard 50 projects, 1 CUDA (GTX280)
i7 940, 8 CPUs, XP Pro 32-bit, standard 50 projects, 4 CUDA (2 GTX 295  
cards) about twice as fast as the Q9300 ~10-12,000 CS w/o CUDA (if  
memory serves)

I ran a log on these systems for 3 hours, I was away and they all used  
the same cc_config file.  I don't feel that this is as typical as my  
normal because of other issues I have raised (and seemingly ignored  
there also); but at any rate I got these results:

        Xeon 32 Xeon 64 Mac Pro Q9300   i7
enforce_schedule(): start       241     387     272     262     407
Request CPU reschedule: application exited      3       11      14      22      
19
Request CPU reschedule: Core client configuration       1       1       1       
1       1
Request CPU reschedule: files downloaded        1       9       15      26      
21
Request CPU reschedule: handle_finished_apps    3       11      14      22      
19
Request CPU reschedule: Idle state change       2       2       2       2       
6
Request CPU reschedule: Scheduling period elapsed       0       4       0       
0       
Request enforce CPU schedule: Checkpoint reached        86      379     129     
92      302
Request enforce CPU schedule: schedule_cpus     7       17      25      47      
44
schedule_cpus(): start  7       17      25      48      44

So, you tell me, how does juggling the schedule this often for these  
many reasons make sense?

I care less about an excuse we have to check after checkpoint... why?   
My TSI is 12 hours... 99% of the tasks should run to completion before  
hitting TSI.

I also don't buy the after download argument.  If Work Fetch is  
working correctly I should not be fetching more work than can be  
handled.  If we are, obsessing in the Resource Scheduler is not the  
way to approach the situation of a broken work fetch algorithm.

And, I point out again, had I been running SaH I would have completed  
about 15 more tasks on the Xeon 32, 18-20 on the Q9300, and 72 on the  
i7 ... also were MW still issuing reliable work I was doing their  
tasks in about 10 seconds per (60K CS per day that machine alone,  
Xeon64)

The super deadline project does not exist, it died because it had  
unrealistic expectations.  So why do you keep using it as an excuse to  
defend the indefensible?  I really don't get it.

Under what scenario do we really, really need to reschedule every  
resource in a system 2 times a minute, or even more often than that...

When a resource comes free at task completion or at expire of TSI, I  
can buy rescheduling the one resource... but not the whole system ...  
even windows does not do that ...
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to