[boinc_dev] Why won't Boinc schedule both GPUs?

Charles Elliott Mon, 20 May 2013 06:08:47 -0700

I am posting this here because the S@H Beta site message board will not let
me create a table.


 

Below you may see a table showing how Boinc scheduled the 2 GPUs in Computer
ID 52900 (S@H Beta) last night.  GPU 2 was vacant for most of the night.  My
conjecture is that when Boinc schedules an AP WU on one GPU, which is now
marked as using 0.805C + 1NV instead of the usual 0.02C + 1NV, it leaves the
other GPU vacant, but his does not explain all the rows in the table.  From
a credit per hour standpoint, it might be better to fully schedule the GPU
and let the O/S priority scheme worry about scheduling the CPU; if most of
the processes in the system are at normal priority, and Boinc tasks have a
lower priority, what it does it matter if the CPUs are over-scheduled?

 

Table: GPU Tasks scheduled on Computer 52900 (S@H Beta) 5/20/2013 

 

        
GPU 1

                        GPU 2

        

Start Time

Finish Time

WU Type

        Start Time

Finish Time

WU Type


GPU GRID WU

                        GPU GRID WU

        

5/19 23:29

5/20 1:58

AP

        -Infinity

1:53

GPU GRID 


  1:58

2:08

MB

        1:58

2:08

MB


  2:08

2:18

MB

                                

  2:18

3:02

AP

                                

  3:02

3:14

MB

        3:02

3:14

MB


  3:14

3:25

MB

                                

  3:25

4:00

AP

                                

4:00

4:12

MB

        4:00

4:13

MB


  4:13

4:58

AP

                                

  4:58

5:09

MB

        4:58

5:09

MB


  5:09

5:21

MB

        5:09

5:21

MB


           5:21

5:32

MB

        5:21

5:32

MB

 

 

This system downloaded single-digits numbers of GPU WUs virtually every time
it made a server request last night.  At 5:00 AM it had the following WUs in
inventory:

 

MB CPU 168

        GPU 1906

AP  CPU  25

       GPU 578

 

This information posted yesterday on the Beta Site message board may help
clarify the situation.  

 

"BOINC Strange Behavior "

 

"1. Since installing 2 Kepler-class GPUs, Computer ID 52900 has been unable
to download more than a day's supply of WUs, usually there has been less
than a day's supply in inventory.

2. On May 14, the server sent 22 AP CPU WUs (which take about 27 hours
apiece to complete) with only two hours to finish them; they all timed out
and errored out.

3. Last week I suspended all the S@H WUs via Efmer's BoincTasks to finish up
the few AP WUs in inventory. Boinc only processed 3 more AP WUs, and then
quit using the GPUs altogether. It would only start using the GPUs again
when I unsuspended the regular S@H WUs, which it then processed.

4. At 21:40:00 5/18/2013, with about 1 day's supply of WUs available, all
evenly distributed between AP & S@H and between CPU and GPU, when an AP WU
finished on GPU 2 and after a communication with the server returning that
result, Boinc refused to schedule any further work on that GPU; it just left
it idle. I did not notice it until about 13:00:00 5/19/13. I tried
everything I could think of with preferences and cc_config.xml to try to
make Boinc use the second GPU; I even reinstalled Boinc 7.0.62 (replacing
7.0.64), but nothing worked. I could not reboot because I am running a long
experiment that would be impossible to restart except at the beginning.
Finally, at 13:39:00 I resumed GPUGRID and requested some work. When a
GPUGRID WU was finally completely downloaded at 13:52:00, it preempted the
AP WU on GPU 1, and began executing there; GPU 2 was still unused. So I
requested another WU from GPUGRID and it was scheduled on GPU 2. At that
point Boinc went wild with requests for more work from S@H Beta. At one time
there were 971 workunits in the transfer queue; I have not seen that for
months. Jealous mistress syndrome? Now I have 306 S@H CPU WUs (2 days), 1169
S@H GPU WUs (5.4 days), 5 AP CPU WUs (0.7 days), and 423 AP GPU WUs (14.5
days). The day's estimates may be a little fuzzy because of changes in apps,
which affects the average time per WU, but the quantities are exact.

5. Boinc no longer computes the time remaining on a WU by ratio and
proportion after the WU is 50% or even 85% complete based on how long it
took to complete the first part. When an AP WU is 99.99% complete, the GUI
still indicates it has 34 hours remaining." 

 

 

Has anyone at Berkeley ever taken a look at the top-performing computer on
the primary S@H site?  This guy has a Core i7-3970X with 12 CPUs and 8
NVidia GTX Titan GPUs; conservatively estimated at retail prices, that is at
least a $9,000 system.  Yet in terms of workunits, he operates always at the
margin.  For example, right this minute he has only 178 workunits in
progress.  Yet he can finish a SETI@home V7 MB GPU task in about a minute
and an AP task in about 33 minutes.  On the Tuesday before last, when I last
looked, he was out of workunits, sitting idle almost the entire day.
Imagine his disappointment, if not heartbreak.  Yeah, I know, he's not
complaining.  But is this really what Berkeley intends?  Does this
scheduling situation have anything to do with the new Kepler-architecture
NVidia GPUs, e.g., the absence of a cuda_kepler plan class analogous to the
cuda_fermi plan class?

 

 

Charles Elliott

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

[boinc_dev] Why won't Boinc schedule both GPUs?

Reply via email to