Hi Jesus

To go into a lot of detail on the FFT would take a _lot_ of time (and is
not necessary for most users) so I will answer each of your questions
briefly and hope that you can 'look under mask' to get any details you
require.


> I would like to know what signal processing scheme is implemented on
> it. Up to now, I've been doing "look undermask", and trying to guess
> which is the architecture...if it is based on butterflies, where the
> twiddle factors are located...

The implementation is a streaming (requires input data samples on every
clock cycle) decimation in time (DIT) architecture allowing the division
of an input data stream into multiple parallel streams. If you look at a
full DIT FFT butterfly
(http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fft.html) 
you will see 2^n inputs and 2^n outputs with n layers of twiddle stages 
between. If we implemented the whole butterfly directly (there is an fft_direct 
block in the library that can do this), we would process 2^n inputs on every 
clock cycle and require (2^(n-1))*n complex multipliers and (2^n)*n complex 
adders (assuming no optimisations). However, if we only have to process 2^m 
(m<n) samples per clock cycle (and thus have 2^(n-m) clock cycles to process 
the full 2^n inputs), we can time-multiplex our hardware so that we use much 
fewer multipliers and adders (a factor of on the order of 2^(n-m) less). 

The twiddle factors are located in the twiddle_* blocks in
fft/fft_biplex*/fft_stage_*/butterfly_direct and fft_direct/butterfly*.
You will notice that the twiddle factors are trivial in the first two
stages (1, and +/-1 respectively) and thus are not stored in BRAM but
implemented in logic. 

> I would also like to know what does "biplex" mean?

Biplex relates to the fact that our FFT can process two input data
streams (e.g two polarsiations from the same antenna) simultaneously by
time multiplexing hardware (this is on top of the time multiplexing
mentioned above). Basically, if you look at an FFT butterfly you will
notice half of the inputs are combined with the other half in any
twiddle stage. We thus have to buffer up half the inputs and process
them later with the other half. While we are buffering we process
another input stream. 

> About using the block:
>  -Sync Signal: I've been reading a Casper Memo about this signal, and
> I guess this signal is always necessary to use the fft block. Is this
> signal necessary because the fft block works with frames?. In spite of
> reading the memo it is not clearly enough for me how to stablish the
> period of this signal, I now it has to be with the number of points of
> the FFT, the number of inputs and also with something connected to the
> reorder blocks...which I do not understand very much at this moment...

The sync is required by many blocks in the FFT and _must_ occur in the
correct place with respect to the input data stream (one clock cycle
before the first sample/s of a spectrum/frame) and with the correct
period. It helps the twiddle blocks generate the correct twiddle factor
at the correct time with respect to the data among other things. 

Reorder blocks are used to reorder the data at the output of the FFT as
it is in bit-reversed order. The reorder blocks use a clever scheme
where data can be reordered without needing a double buffer i.e instead
of buffering up data, and then reading it out in a particular order
while writing into another buffer, the reorder block can read and write
from the same buffer at the same time. This halves the buffer size
requirements but requires that the addresses to read and write be
generated in a particular way. This sequence of addresses must be
generated correctly to prevent data appearing at the wrong time. The
sequence is a multiple of the FFT frame size but not necessarily a power
of two.  

> -About the input / output signal...
>  I guess they are complex signals, is that true?. In that case what I
> have to do is just joining in a unique word the real and the imaginary
> part of the input signal?

The fft block processes complex input data, the fft_wideband_real
processes real data. Join your real and imaginary data with a ri_to_c
block from Misc.

> - Is there any limitation to the number of bits for the inputs,
> coefficients...?

There is no limitation besides the resources available in the FPGA.
Also, the multipliers, BRAMs etc have hard upper limits on input bit
width etc. Anything 'reasonable' should be fine. We use 18 bits for the
real and imaginary parts of our data paths as well as for our twiddle
factors mainly because this ensures optimal use of our BRAMs (36 bit
input) and multipliers (18x25 bits).  

> - Is there any limitation to the order of the FFT? Can it be from 2
> to...?

There is no limitation on the size of the FFT besides the resources in
the FPGA. The largest FFT I know of that has been implemented on a ROACH
is a 2^15 point fft_wideband_real. ROACH2 should be able to do more. 

Hope this helps

Regards
Andrew


Reply via email to