On Wed, Jan 12, 2011 at 3:22 PM, Michael Dickens <[email protected]> wrote:
> On Jan 12, 2011, at 2:56 PM, Moeller wrote: > > On 12.01.2011 14:25, Michael Dickens wrote: > >> the CPU). I think that if a GPU can be used, it will be most effective > in things like filterbanks, or when searching for packets (via their unique > sync sequence, so matched filtering), or very large FIR filters -- places > where a LOT of computations and data must be processed and can be > parallelized easily. > > > > Is there an efficient parallel FIR implementation for CUDA? You need only > few operations on > > a large set of data. So, isn't this too much for the stream-processor > local-memory? > > If GPU global memory has to be used, this would lead to a slower > concurrent access. > > And then there is still the transfer time from/to the computer RAM. > > It would be great to have a fast filter, but is it really faster than an > optimized SSE CPU FIR? > > I had the feeling, that the ratio of computing operations vs. number of > samples has to be > > high for a significant GPU vs. CPU speedup. > > I'm curious about how much speedup you can achieve for FIR filters > > (let's say large/sharp filters of 1024 taps). > > The "very large FIR filters" was a thought, as an example of an operation > that might benefit from a GPU at least when using OpenCL (or CUDA). I > haven't done testing yet to know if a GPU can do better than a CPU using > vector instructions ... but I'm getting there. If/when I do get there, I'll > post my results & thoughts. > > Your comment about global versus local memory certainly does seem true from > reading the OpenCL specs. Most modern GPUs have 3 levels of memory: global > (for the whole GPU, across all cores), core (across all kernel execution > units), and kernel -- in order of decreasing size, increasing access speed, > and increasing time to move data to/from. I've been playing around with > global memory only so far, but I'll look into the other levels as well to > see what they can provide & the trade-offs required. > > Good & interesting discussion! - MLD > > > Since FFTS & IFFTs are so speedy on GPUs (CUFFT is quite good now), a good way is to filter in the frequency domain via FFT -> pointwise multiply -> IFFT. That way you can have arbitrarily sharp filters. -Steven
_______________________________________________ Discuss-gnuradio mailing list [email protected] http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
