On Wed, Jan 12, 2011 at 3:22 PM, Michael Dickens <[email protected]> wrote:

> On Jan 12, 2011, at 2:56 PM, Moeller wrote:
> > On 12.01.2011 14:25, Michael Dickens wrote:
> >> the CPU).  I think that if a GPU can be used, it will be most effective
> in things like filterbanks, or when searching for packets (via their unique
> sync sequence, so matched filtering), or very large FIR filters -- places
> where a LOT of computations and data must be processed and can be
> parallelized easily.
> >
> > Is there an efficient parallel FIR implementation for CUDA? You need only
> few operations on
> > a large set of data. So, isn't this too much for the stream-processor
> local-memory?
> > If GPU global memory has to be used, this would lead to a slower
> concurrent access.
> > And then there is still the transfer time from/to the computer RAM.
> > It would be great to have a fast filter, but is it really faster than an
> optimized SSE CPU FIR?
> > I had the feeling, that the ratio of computing operations vs. number of
> samples has to be
> > high for a significant GPU vs. CPU speedup.
> > I'm curious about how much speedup you can achieve for FIR filters
> > (let's say large/sharp filters of 1024 taps).
>
> The "very large FIR filters" was a thought, as an example of an operation
> that might benefit from a GPU at least when using OpenCL (or CUDA).  I
> haven't done testing yet to know if a GPU can do better than a CPU using
> vector instructions ... but I'm getting there.  If/when I do get there, I'll
> post my results & thoughts.
>
> Your comment about global versus local memory certainly does seem true from
> reading the OpenCL specs.  Most modern GPUs have 3 levels of memory: global
> (for the whole GPU, across all cores), core (across all kernel execution
> units), and kernel -- in order of decreasing size, increasing access speed,
> and increasing time to move data to/from.  I've been playing around with
> global memory only so far, but I'll look into the other levels as well to
> see what they can provide & the trade-offs required.
>
> Good & interesting discussion! - MLD
>
>
>
Since FFTS & IFFTs are so speedy on GPUs (CUFFT is quite good now), a good
way is to filter in the frequency domain via FFT -> pointwise multiply ->
IFFT. That way you can have arbitrarily sharp filters.

-Steven
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to