Hi Ryan,

I wrote the various forms of the CASPER FFT, including this one.  The broad
idea of the architecture was described in:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4840623&tag=1

Basically, (as far as I can tell from the brief perusal of split-radix
ffts), I think this *is* a split radix FFT.   The mix of serial and
parallell FFTs is used to evaluate a radix-2 Cooley Tukey FFT that is
decomposed into several smaller FFTs that can be computed independently
(without inter-communication of samples), followed by a direct FFT that
cycles through twiddle coefficients (i.e. it is not truly a stand-alone
direct FFT) that combines does the remaining butterflies, drawing on
samples from all the sub-FFTs.  Data permutation is a bit of a headache in
these architectures, so I invented a permuting buffer that uses basic group
theory to automatically generate in-place permuters that do the necessary
data reordering.

I think you may have been misunderstanding how the architecture worked, and
that is why you perhaps thought it was inefficient.  The total buffering is
only 50% higher than the minimum of buffering possible (i.e. only storing
each sample once), and the multipliers are all used at 100% efficiency.
 Higher radices can produce some savings if you are doing more FFTs in
parallel, but barring that, I'd be surprised if there is another
architecture that substantially outperforms this one (but you are welcome
to try!  :)

I'm happy you're documenting.

All the best,
Aaron

On Tue, Mar 12, 2013 at 3:39 PM, Ryan Monroe <[email protected]>wrote:

> Hey all,
> Luke Madden was asking me about what's going on in the FFT-direct today.
>  I'm pretty sure we have basically zero documentation on this lying around,
> so it's a good time to fix that.  I'm going share what I know, but I'd
> appreciate it if other people could add/correct me as needed.....
>
> So, you can split the CASPER FFTs into streaming and parallel FFTs:
>
> streaming: <fft_biplex, fft_biplex_real, fft_biplex_real_4x>
> These FFTs have several independent ports.  Each of these ports is fed
> with normal-order, serial time-domain data and produces normal-order,
> serial frequency-domain data.  If you know something about how pipelined
> FFTs work, you'll probably call it a "Radix 2, Delay-Commutator FFT", or
> R2DC.  In the <fft_biplex>, we follow the R2DC FFT with an
> inverse-delay-commutator stage to un-scramble the data (the casper
> implementation doesn't have the same structure as an
> inverse-delay-commutator, but they do the same thing).  In
> <fft_biplex_real>, we do the same R2DC FFT, but we treat real and imag as
> separate inputs, making four inputs.
>
> parallel: <fft_direct>
> If map_tail is not set, then the fft_direct block accepts all the inputs
> for an fft on *each clock cycle*.  Natural order in, Natural order out.
> If map_tail *is* set, it's a bit more complicated.  Then, this block is
> being used with a number of streaming FFTs to achieve a wideband FFT.
> Imagine a standard DIT FFT.  The early stages of the FFT only use a few
> coefficients.  In fact, they are each FFTs in their own rights, only on a
> subset of the data.  These streaming FFTs are just that:  for as long as we
> can still process the data in a serial fashion, we process each sample
> sequentially.  Then, we do the last 1-4 (typically) stages in a massive
> parallel format.  Here, the same structure is drawn as in the <map_tail=0>
> fft_direct... but the coefficients now change (specifically, their phases
> are incrementing).
>
> This is where my understanding gets a bit hazy, but it looks like the last
> stages of the FFT are being literally enumerated here.  *If someone wants
> to chime in, here is the place to do it*.
>
> In any case, you could actually do these "mixed streaming/parallel FFTs"
> (which are <fft, fft_wideband_real>) in a different fashion, by re-casting
> them as a split-radix FFT (look it up).  Doing this is computationally
> about the same, but saves resources and memory... and is simpler if the
> size of <fft_direct> is greater than 2^2.
>
>
> I hope this helps, Luke (and everyone else)!
>
>
> --Ryan Monroe
>



-- 
Aaron Parsons
510-306-4322
Hearst Field Annex B54, UCB

Reply via email to