I am posting this here because the S@H Beta site message board will not let
me create a table.
Below you may see a table showing how Boinc scheduled the 2 GPUs in Computer
ID 52900 (S@H Beta) last night. GPU 2 was vacant for most of the night. My
conjecture is that when Boinc schedules an AP WU on one GPU, which is now
marked as using 0.805C + 1NV instead of the usual 0.02C + 1NV, it leaves the
other GPU vacant, but his does not explain all the rows in the table. From
a credit per hour standpoint, it might be better to fully schedule the GPU
and let the O/S priority scheme worry about scheduling the CPU; if most of
the processes in the system are at normal priority, and Boinc tasks have a
lower priority, what it does it matter if the CPUs are over-scheduled?
Table: GPU Tasks scheduled on Computer 52900 (S@H Beta) 5/20/2013
GPU 1
GPU 2
Start Time
Finish Time
WU Type
Start Time
Finish Time
WU Type
GPU GRID WU
GPU GRID WU
5/19 23:29
5/20 1:58
AP
-Infinity
1:53
GPU GRID
1:58
2:08
MB
1:58
2:08
MB
2:08
2:18
MB
2:18
3:02
AP
3:02
3:14
MB
3:02
3:14
MB
3:14
3:25
MB
3:25
4:00
AP
4:00
4:12
MB
4:00
4:13
MB
4:13
4:58
AP
4:58
5:09
MB
4:58
5:09
MB
5:09
5:21
MB
5:09
5:21
MB
5:21
5:32
MB
5:21
5:32
MB
This system downloaded single-digits numbers of GPU WUs virtually every time
it made a server request last night. At 5:00 AM it had the following WUs in
inventory:
MB CPU 168
GPU 1906
AP CPU 25
GPU 578
This information posted yesterday on the Beta Site message board may help
clarify the situation.
"BOINC Strange Behavior "
"1. Since installing 2 Kepler-class GPUs, Computer ID 52900 has been unable
to download more than a day's supply of WUs, usually there has been less
than a day's supply in inventory.
2. On May 14, the server sent 22 AP CPU WUs (which take about 27 hours
apiece to complete) with only two hours to finish them; they all timed out
and errored out.
3. Last week I suspended all the S@H WUs via Efmer's BoincTasks to finish up
the few AP WUs in inventory. Boinc only processed 3 more AP WUs, and then
quit using the GPUs altogether. It would only start using the GPUs again
when I unsuspended the regular S@H WUs, which it then processed.
4. At 21:40:00 5/18/2013, with about 1 day's supply of WUs available, all
evenly distributed between AP & S@H and between CPU and GPU, when an AP WU
finished on GPU 2 and after a communication with the server returning that
result, Boinc refused to schedule any further work on that GPU; it just left
it idle. I did not notice it until about 13:00:00 5/19/13. I tried
everything I could think of with preferences and cc_config.xml to try to
make Boinc use the second GPU; I even reinstalled Boinc 7.0.62 (replacing
7.0.64), but nothing worked. I could not reboot because I am running a long
experiment that would be impossible to restart except at the beginning.
Finally, at 13:39:00 I resumed GPUGRID and requested some work. When a
GPUGRID WU was finally completely downloaded at 13:52:00, it preempted the
AP WU on GPU 1, and began executing there; GPU 2 was still unused. So I
requested another WU from GPUGRID and it was scheduled on GPU 2. At that
point Boinc went wild with requests for more work from S@H Beta. At one time
there were 971 workunits in the transfer queue; I have not seen that for
months. Jealous mistress syndrome? Now I have 306 S@H CPU WUs (2 days), 1169
S@H GPU WUs (5.4 days), 5 AP CPU WUs (0.7 days), and 423 AP GPU WUs (14.5
days). The day's estimates may be a little fuzzy because of changes in apps,
which affects the average time per WU, but the quantities are exact.
5. Boinc no longer computes the time remaining on a WU by ratio and
proportion after the WU is 50% or even 85% complete based on how long it
took to complete the first part. When an AP WU is 99.99% complete, the GUI
still indicates it has 34 hours remaining."
Has anyone at Berkeley ever taken a look at the top-performing computer on
the primary S@H site? This guy has a Core i7-3970X with 12 CPUs and 8
NVidia GTX Titan GPUs; conservatively estimated at retail prices, that is at
least a $9,000 system. Yet in terms of workunits, he operates always at the
margin. For example, right this minute he has only 178 workunits in
progress. Yet he can finish a SETI@home V7 MB GPU task in about a minute
and an AP task in about 33 minutes. On the Tuesday before last, when I last
looked, he was out of workunits, sitting idle almost the entire day.
Imagine his disappointment, if not heartbreak. Yeah, I know, he's not
complaining. But is this really what Berkeley intends? Does this
scheduling situation have anything to do with the new Kepler-architecture
NVidia GPUs, e.g., the absence of a cuda_kepler plan class analogous to the
cuda_fermi plan class?
Charles Elliott
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.