On Wed, Jan 12, 2011 at 11:03 AM, Tom Rondeau <[email protected]> wrote: > > I wanted to throw out another idea that no one seems to be bringing > up, and this relates to a comment back about how CUDA is limited > because of the bus transfers. That's not CUDA that is doing that but > the architecture of the machine and having the host (CPU) and device > (GPU) separated on a bus. That has nothing to do with CUDA as a > language.
I think the notion that the language is not the barrier (the hardware architecture is) is precisely why I personally am more excited about OpenCL as a language than CUDA per-se. CUDA is inherently tied to nVidia hardware, and while is conceivable that CUDA will end up being supported on a wider variety of CPU/GPU architectures (e.g. the recently announced 'Project Denver'), I don't imagine it will ever find support on non-nVidia hardware. OpenCL is, on the other hand, enjoying support from a wide variety of hardware vendors (AMD/ATI, nVidia, IBM, Intel, Apple, etc.), and was designed to run on a wide variety of architectures (including a mix of CPU's, GPU's, accelerator/DSP boards, etc.). In the long run it seems to me to be a much better environment for dealing with heterogeneous computing, and without bringing up any serious concerns about being tied to any single vendor. > > Currently, though, GPUs still have a place for certain applications, > even in signal processing and radio. They are not a panacea for > improving the performance of all signal processing applications, but > if you understand the limitations and where they benefit you, you can > get some really good gains out of them. I'm excited about anyone > researching and experimenting in this area and very hopeful for the > future use of any knowledge and expertise we can generate now. > > Tom Agreed. Having spent some time on working with OpenCL on GPU's for solving a different sort of problem, I completely agree they are both powerful, and not a silver bullet. I would like to echo some of the previous comments: replacing single processing blocks in a flowgraph with a drop-in CUDA/OpenCL replacement is not likely to lead to any significant gains. It may relieve some of the work the CPU has to do (and thus be a net gain in terms of total samples that can be processed without dropping any on the floor), but I suspect Steve is correct: the big gains will be made in either applications requiring large filtering/channelizers/etc. or with complete RX and/or TX chains written in OpenCL, and GNURadio merely acting as a shuttle from the USRPx/UHD-enabled source/sink and the smaller trickle of bits coming back out (or going in). If that is the case, I think the follow-on question becomes: does GNURadio need to do anything to support OpenCL/CUDA/etc. enabled applications, or is everyone that is doing that sort of work simply writing their own custom block to interface with their custom OpenCL/CUDA/etc. kernel, since they are likely going to have to do all sorts of nasty optimization tricks to get the best performance for their particular application anyways? Or can a common block serve as a generic interface, which loads whatever custom kernel needs to be written, and works well enough in 90% of the cases? I'd like to think the latter is true, but I don't have any evidence as of yet that it might be. Perhaps at a later date I'll have something to share that points in one direction or the other. Doug -- Doug Geiger [email protected] _______________________________________________ Discuss-gnuradio mailing list [email protected] http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
