Hi Jesus To go into a lot of detail on the FFT would take a _lot_ of time (and is not necessary for most users) so I will answer each of your questions briefly and hope that you can 'look under mask' to get any details you require.
> I would like to know what signal processing scheme is implemented on > it. Up to now, I've been doing "look undermask", and trying to guess > which is the architecture...if it is based on butterflies, where the > twiddle factors are located... The implementation is a streaming (requires input data samples on every clock cycle) decimation in time (DIT) architecture allowing the division of an input data stream into multiple parallel streams. If you look at a full DIT FFT butterfly (http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fft.html) you will see 2^n inputs and 2^n outputs with n layers of twiddle stages between. If we implemented the whole butterfly directly (there is an fft_direct block in the library that can do this), we would process 2^n inputs on every clock cycle and require (2^(n-1))*n complex multipliers and (2^n)*n complex adders (assuming no optimisations). However, if we only have to process 2^m (m<n) samples per clock cycle (and thus have 2^(n-m) clock cycles to process the full 2^n inputs), we can time-multiplex our hardware so that we use much fewer multipliers and adders (a factor of on the order of 2^(n-m) less). The twiddle factors are located in the twiddle_* blocks in fft/fft_biplex*/fft_stage_*/butterfly_direct and fft_direct/butterfly*. You will notice that the twiddle factors are trivial in the first two stages (1, and +/-1 respectively) and thus are not stored in BRAM but implemented in logic. > I would also like to know what does "biplex" mean? Biplex relates to the fact that our FFT can process two input data streams (e.g two polarsiations from the same antenna) simultaneously by time multiplexing hardware (this is on top of the time multiplexing mentioned above). Basically, if you look at an FFT butterfly you will notice half of the inputs are combined with the other half in any twiddle stage. We thus have to buffer up half the inputs and process them later with the other half. While we are buffering we process another input stream. > About using the block: > -Sync Signal: I've been reading a Casper Memo about this signal, and > I guess this signal is always necessary to use the fft block. Is this > signal necessary because the fft block works with frames?. In spite of > reading the memo it is not clearly enough for me how to stablish the > period of this signal, I now it has to be with the number of points of > the FFT, the number of inputs and also with something connected to the > reorder blocks...which I do not understand very much at this moment... The sync is required by many blocks in the FFT and _must_ occur in the correct place with respect to the input data stream (one clock cycle before the first sample/s of a spectrum/frame) and with the correct period. It helps the twiddle blocks generate the correct twiddle factor at the correct time with respect to the data among other things. Reorder blocks are used to reorder the data at the output of the FFT as it is in bit-reversed order. The reorder blocks use a clever scheme where data can be reordered without needing a double buffer i.e instead of buffering up data, and then reading it out in a particular order while writing into another buffer, the reorder block can read and write from the same buffer at the same time. This halves the buffer size requirements but requires that the addresses to read and write be generated in a particular way. This sequence of addresses must be generated correctly to prevent data appearing at the wrong time. The sequence is a multiple of the FFT frame size but not necessarily a power of two. > -About the input / output signal... > I guess they are complex signals, is that true?. In that case what I > have to do is just joining in a unique word the real and the imaginary > part of the input signal? The fft block processes complex input data, the fft_wideband_real processes real data. Join your real and imaginary data with a ri_to_c block from Misc. > - Is there any limitation to the number of bits for the inputs, > coefficients...? There is no limitation besides the resources available in the FPGA. Also, the multipliers, BRAMs etc have hard upper limits on input bit width etc. Anything 'reasonable' should be fine. We use 18 bits for the real and imaginary parts of our data paths as well as for our twiddle factors mainly because this ensures optimal use of our BRAMs (36 bit input) and multipliers (18x25 bits). > - Is there any limitation to the order of the FFT? Can it be from 2 > to...? There is no limitation on the size of the FFT besides the resources in the FPGA. The largest FFT I know of that has been implemented on a ROACH is a 2^15 point fft_wideband_real. ROACH2 should be able to do more. Hope this helps Regards Andrew

