Re: [boinc_dev] 6.6.20 and work scheduling

John . McLeod Tue, 28 Apr 2009 05:53:03 -0700

Paul:

The events that drive rr_sim:


RPC complete:  We have now committed to some more work.  Is there anything
that needs to start right now to complete on time.
File Download Complete:  We now have a task ready to run.  Does it need to
get started right now?
Task complete:  What should we run now.  We have a free processor.
Project and Task Suspend:  Do we still have the same tasks running?  Do we
have an empty resource?
Project and Task Resume:  Some tasks have been suspended for an unknown
amount of time.  Do we need to run one of them?

Please concentrate on the TEST for whether something needs to be run now or
not.

jm7


                                                                           
             "Paul D. Buck"                                                
             <p.d.b...@comcast                                             
             .net>                                                      To 
                                       [email protected]              
             04/27/2009 11:40                                           cc 
             PM                        "Josef W. Segur"                    
                                       <[email protected]>, BOINC dev   
                                       <[email protected]>,       
                                       Richard Haselgrove                  
                                       <[email protected]>,      
                                       David Anderson                      
                                       <[email protected]>, Rom       
                                       Walton <[email protected]>           
                                                                   Subject 
                                       Re: [boinc_dev] 6.6.20 and work     
                                       scheduling                          
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           





On Apr 27, 2009, at 12:42 PM, [email protected] wrote:

      We need to run the check for what should be run on a variety of
      events and
      at least once every task switch interval.

An argument you have made before, and not defended with any evidence, if I
may return the favor.

Your only statement is that it doesn't hurt anything to do the test.  I
should put you in the situation of trying to prove the negative ... but I
won't.  But, if it does not hurt anything to waste the time, then I can
also use that argument to fight for more accurate benchmarks run every
hour.  Won't hurt to run them... and we will have more accurate numbers...
and they don't take that long to run ... heck, why not on the half hour?
even better ...


      We need to do schedule enforcement on a different set of events, one
      of
      which is going to occur very frequently - checkpoint.

And all we need to do is to check to see if the application that has just
made a checkpoint is at TSI.  There is no, repeat no need to do anything
else.  Yet we do the full monte.

I just spent about 6 hours running a test on 5 systems and fiddling with
the logs:

Xeon 32, Dual Xeons with HT for 4 CPUs, running XP Pro 32-Bit, around 50
projects, 1 CUDA (9800GT), about 3-4,000 CS per day w/o CUDA (if memory
serves)
Xeon 64, Dual Xeons with HT for 4 CPUs, running XP Pro 64-Bit, ABC and
YoYo, 1 ATI GPU on MW, about 3.5-4,500 CS per day w/o CUDA (if memory
serves)
Mac Pro, OS-X, 8 Cores, Xeons, about 25 projects, no GPU
Q9300, XP Pro 32-bit, 4 cores, standard 50 projects, 1 CUDA (GTX280)
i7 940, 8 CPUs, XP Pro 32-bit, standard 50 projects, 4 CUDA (2 GTX 295
cards) about twice as fast as the Q9300 ~10-12,000 CS w/o CUDA (if memory
serves)

I ran a log on these systems for 3 hours, I was away and they all used the
same cc_config file.  I don't feel that this is as typical as my normal
because of other issues I have raised (and seemingly ignored there also);
but at any rate I got these results:
                                                                             
                                          Xeon 32 Xeon 64 Mac Pro Q9300  i7  
                                                                             
 enforce_schedule(): start                241     387     272     262    407 
                                                                             
 Request CPU reschedule: application      3       11      14      22     19  
 exited                                                                      
                                                                             
 Request CPU reschedule: Core client      1       1       1       1      1   
 configuration                                                               
                                                                             
 Request CPU reschedule: files downloaded 1       9       15      26     21  
                                                                             
                                                                             
                                                                             
 Request CPU reschedule:                  3       11      14      22     19  
 handle_finished_apps                                                        
                                                                             
 Request CPU reschedule: Idle state       2       2       2       2      6   
 change                                                                      
                                                                             
 Request CPU reschedule: Scheduling       0       4       0       0          
 period elapsed                                                              
                                                                             
 Request enforce CPU schedule: Checkpoint 86      379     129     92     302 
 reached                                                                     
                                                                             
 Request enforce CPU schedule:            7       17      25      47     44  
 schedule_cpus                                                               
                                                                             
 schedule_cpus(): start                   7       17      25      48     44  
                                                                             



So, you tell me, how does juggling the schedule this often for these many
reasons make sense?

I care less about an excuse we have to check after checkpoint... why?  My
TSI is 12 hours... 99% of the tasks should run to completion before hitting
TSI.

I also don't buy the after download argument.  If Work Fetch is working
correctly I should not be fetching more work than can be handled.  If we
are, obsessing in the Resource Scheduler is not the way to approach the
situation of a broken work fetch algorithm.

And, I point out again, had I been running SaH I would have completed about
15 more tasks on the Xeon 32, 18-20 on the Q9300, and 72 on the i7 ... also
were MW still issuing reliable work I was doing their tasks in about 10
seconds per (60K CS per day that machine alone, Xeon64)

The super deadline project does not exist, it died because it had
unrealistic expectations.  So why do you keep using it as an excuse to
defend the indefensible?  I really don't get it.

Under what scenario do we really, really need to reschedule every resource
in a system 2 times a minute, or even more often than that...

When a resource comes free at task completion or at expire of TSI, I can
buy rescheduling the one resource... but not the whole system ... even
windows does not do that ...

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.20 and work scheduling

Reply via email to