[Discuss-gnuradio] Re: CUDA GPU Vs CELL BE

Yu Pan Wed, 15 Jul 2009 19:53:46 -0700

>Eric Blossom wrote:
> advantage of it.  Again from reading, it appears that you need at
> least 64 elements that you can apply an instruction to, to be in it's
> target zone.  For certain parts of our graphs, this is probably OK
> (e.g., FEC decode, FIR's, FFTs), but I'm kind of dubious about
> anything with a depedency chain (IIR's, PLLs, equalizers, etc.)


32 threads in a so called "warp" execute together in a Single 
Instruction Multiple Threads (SIMT) manner, on a particular Streaming 
Multiprocessor (SM). The control flows among the 32 threads can diverge, 
but when that happens, each set of divergence paths will be executed 
serially. Your observations are correct. At least for now, CUDA's 
strength is still quite restricted to computation intensive data 
parallel processing, where nVidia's other 99% business lies in (of 
course, the graphics processing). But after GPGPU processing takes off, 
things could change.

> I'm also not sure if you can launch multiple kernels simultaneously
> (CUDA-speak).  If you could launch multiple kernels, we'd have a
> better chance of using the parallelism.

Currently no. But it is possible execute several parallel tasks within 
the same kernel by diverging the control flow, and at the same time, 
trying to group each different task (each variance of the control flow) 
in groups of 32 threads (considering padding?) to avoid in warp 
divergence. Nvcc compiler will at least take care of register allocation 
so that multiple tasks won't use more registers than the max a single 
one requires.

> 
> 
> Eric

-Yu

-- 
Posted via http://www.ruby-forum.com/.


_______________________________________________
Discuss-gnuradio mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio

[Discuss-gnuradio] Re: CUDA GPU Vs CELL BE

Reply via email to