[boinc_dev] Another problem with GPU FIFO scheduling

Richard Haselgrove Sat, 07 Aug 2010 07:45:31 -0700

The new-ish Fermi class of NVidia GPUs has much better hardware support for 
multitasking than its predecessors. I don't know of any project that is 
using this officially so far, perhaps because Fermi-class GPUs are still 
expensive and comparatively rare. but prices are falling and sales 
increasing - they will become widespread eventually.


Given the extensive use of third-party applications at SETI, it is 
inevitable that experimentation has taken place. Empirically, several 
commentators have found the same answer: the SETI Fermi application (as 
supplied by NVidia itself) runs most productively when three instances are 
scheduled to run concurrently.

But with the current FIFO scheduling and no pre-emption, this leads to 
problems when running, as BOINC is designed to do, multiple projects.

Consider http://a.imageshack.us/img441/9485/notaskswitchwithcuda.png

The three SETI tasks finish asychronously. Each time one exits, 0.34 GPUs 
become available: but that's not enough to launch the GPUGrid tasks ahead of 
them in the FIFO queue. So BOINC deals yet another SETI (or SETI Beta, which 
I've set up similarly) task from the bottom of the pack. Presumably, this 
would continue indefinitely, until either fractional GPU tasks from all 
other projects ran dry, or imminent deadline pressure forced GPUGrid into 
'High Priority'.

This is analogous to the situation we saw with AQUA and multi-threaded CPU 
applications, where the MT app had a tendency to hog the CPU and keep out 
other projects. That's been sorted now: this one hasn't.

I'm sure the BOINC client will complete the work before deadline (although 
I've intervened manually, and these tasks won't get a chance to hang around 
that long). But that isn't the point.

The science behind GPUGrid requires that tasks be returned in a timely 
fashion. Earlier results are required to generate the starting conditions 
for later jobs. Any scientific results of value will depend on a long chain 
of job - process - result - new job - process - result - new job..., and so 
on. Although they allow a deadline of up to five days, to allow slower and 
part-time GPUs to participate, they prefer results back within 24 hours if 
possible. FIFO scheduling without allowance for fractional usage is 
preventing this.

The basic plumbing for task-switch GPU scheduling was in place as far back 
as last December, with the introduction of cuda_short_term_debt and 
ati_short_term_debt (see for example changeset 19898). Is there any chance 
of returning to this functional area of BOINC development, before the next 
quantum leap in technology overtakes us? 


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

[boinc_dev] Another problem with GPU FIFO scheduling

Reply via email to